Home | History | Annotate | only in /src/sys/uvm
History log of /src/sys/uvm
RevisionDateAuthorComments
 1.11 04-Jan-2017  christos really, don't install uvm_physseg.h!
 1.10 22-Dec-2016  cherry Use uvm_physseg.h:uvm_page_physload() instead of uvm_extern.h

For this, include uvm_physseg.h in the build and include tree, make a
cosmetic modification to the prototype for uvm_page_physload().
 1.9 11-Feb-2006  yamt branches: 1.9.134; 1.9.138;
remove the following options. no objections on tech-kern@.

UVM_PAGER_INLINE
UVM_AMAP_INLINE
UVM_PAGE_INLINE
UVM_MAP_INLINE
 1.8 26-Nov-2002  lukem branches: 1.8.22; 1.8.34; 1.8.36; 1.8.38;
Remove KDIR=, since SYS_INCLUDE=symlinks and KDIR are not supported any more.
 1.7 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.6 01-Aug-2000  wiz branches: 1.6.2; 1.6.4; 1.6.6;
Rename VM_INHERIT_* to MAP_INHERIT_* and move them to sys/sys/mman.h as
discussed on tech-kern.
Retire sys/uvm/uvm_inherit.h, update man page for minherit(2).
 1.5 27-Jun-2000  mrg install uvm_pmap.h
 1.4 26-Jun-2000  mrg install uvm_param.h.
 1.3 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.2 25-Jun-2000  mrg <vm/vm_prot.h> becomes <uvm/uvm_prot.h>
 1.1 12-Jun-1998  cgd branches: 1.1.14;
Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.1.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.6.6.1 01-Oct-2001  fvdl Catch up with -current.
 1.6.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.6.2.2 11-Dec-2002  thorpej Sync with HEAD.
 1.6.2.1 21-Sep-2001  nathanw Catch up to -current.
 1.8.38.1 22-Apr-2006  simonb Sync with head.
 1.8.36.1 09-Sep-2006  rpaulo sync with head
 1.8.34.1 18-Feb-2006  yamt sync with head.
 1.8.22.1 21-Jun-2006  yamt sync with head.
 1.9.138.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.9.134.1 05-Feb-2017  skrll Sync with HEAD
 1.1 25-Mar-1999  chs branches: 1.1.2;
file README.ubc was initially added on branch chs-ubc.
 1.1.2.1 25-Mar-1999  chs some info about UBC.
 1.37 03-Jun-2021  riastradh uvm(9): Enable swap encryption by default.

For machines where the performance impact of swapping before the
system has an opportunity to process `vm.swap_encrypt=0' in
/etc/sysctl.conf, you can disable it again by adding

options VMSWAP_DEFAULT_PLAINTEXT

to the kernel config.
 1.36 04-Aug-2020  skrll branches: 1.36.6; 1.36.10;
G/C USE_TOPDOWN_VM. __USE_TOPDOWN_VM is used (and hidden)
 1.35 29-Jun-2020  riastradh uvm(9): Switch from legacy rijndael API to new aes API.
 1.34 10-May-2020  pgoyette Add missing dependency.

Fixes builds with VM_SWAP but no other users of rijndael crypto code.
 1.33 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.32 27-Dec-2019  ad branches: 1.32.2;
Redo the page allocator to perform better, especially on multi-core and
multi-socket systems. Proposed on tech-kern. While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
 1.31 15-Dec-2019  ad Merge from yamt-pagecache:

- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
 1.30 20-Nov-2019  pgoyette Move all non-emulation-specific coredump code into the coredump module,
and remove all #ifdef COREDUMP conditional compilation. Now, the
coredump module is completely separated from the emulation modules, and
they can all be independently loaded and unloaded.

Welcome to 9.99.18 !
 1.29 19-May-2018  jdolecek branches: 1.29.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.
 1.28 23-Dec-2016  cherry branches: 1.28.14;
"Make NetBSD great again!"

Introduce uvm_hotplug(9) to the kernel.

Many thanks, in no particular order to:

TNF, for funding the project.

Chuck Silvers - for multiple API reviews and feedback.
Nick Hudson - for testing on multiple architectures and bugfix patches.
Everyone who helped with boot testing.

KeK (http://www.kek.org.in) for hosting the primary developers.
 1.27 01-Dec-2016  mrg allow the sizes of the maphist and pdhist to be set in the config
file via UVMHIST_MAPHIST_SIZE and UVMHIST_PDHIST_SIZE.
 1.26 12-Aug-2016  skrll Make UVMHIST_PRINT work again by making it define KERNHIST_PRINT
 1.25 27-Jul-2015  maxv branches: 1.25.2;
Several changes and improvements in KMEM_GUARD:
- merge uvm_kmguard.{c,h} into subr_kmem.c. It is only user there, and
makes it more consistent. Also, it allows us to enable KMEM_GUARD
without enabling DEBUG.
- rename uvm_kmguard_XXX to kmem_guard_XXX, for consistency
- improve kmem_guard_alloc() so that it supports allocations bigger than
PAGE_SIZE
- remove the canary value, and use directly the kmem header as underflow
pattern.
- fix some comments

(The UAF fifo is disabled for the moment; we actually need to register
the va and its size, and add a weight support not to consume too much
memory.)
 1.24 12-Apr-2015  joerg UVM_RESERVED_PAGES_PER_CPU must be a param, not a flag.
 1.23 11-Apr-2015  joerg Allow changing the per-cpu emergency page reservation via kernel config.
 1.22 10-Oct-2014  uebayasi branches: 1.22.2;
Use opt_*.h to not polute CPPFLAGS.

Attribute dependency is not yet. Revert a definition.
 1.21 10-Oct-2014  uebayasi Define "uvm" attribute and mark files.
 1.20 17-May-2011  mrg branches: 1.20.4; 1.20.14;
fix the ordering and make UVMHIST enable KERNHIST automatically.
 1.19 09-Dec-2010  uebayasi branches: 1.19.2;
Make UVM_PAGE_TRKOWN a real flag.
 1.18 21-Feb-2010  drochner branches: 1.18.2;
rename the va0_disabled option and cpp conditional to "disable" as well,
for consistency, and document option and sysctl flag
 1.17 18-Feb-2010  drochner Disable mapping of virtual address 0 by user programs per default.
This blocks an easy exploit of kernel bugs leading to dereference
of a NULL pointer on some architectures (eg i386).
The check can be disabled in various ways:
-by CPP definitions in machine/types.h (portmaster's choice)
-by a kernel config option USER_VA0_DISABLED_DEFAULT=0
-at runtime by sysctl vm.user_va0_disabled (cannot be cleared
at securelevel>0)
 1.16 21-Oct-2009  rmind branches: 1.16.2;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.15 09-Aug-2009  matt Add [default] option to make UAREAs swappable. Disabling the option makes
them unswappable and therefore allocatable using KSEG/BAT/etc.
 1.14 28-Jun-2009  rmind Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
 1.13 29-Mar-2009  ad kernel memory guard for DEBUG kernels, proposed on tech-kern.
See kmem_alloc(9) for details.
 1.12 19-Nov-2008  ad branches: 1.12.4;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime
 1.11 04-Jun-2008  ad branches: 1.11.4; 1.11.6; 1.11.12;
- Switch off the map evcnts by default.
- SAVE_HINT() doesn't need to be atomic.
 1.10 25-Oct-2007  yamt branches: 1.10.16; 1.10.18; 1.10.20; 1.10.22;
defparam PAGER_MAP_SIZE.
 1.9 17-Jul-2007  joerg branches: 1.9.6; 1.9.8; 1.9.12;
Add native mremap system call based on the UVM implementation for
Linux compat. Add code to enforce alignment of the new location.
Special thanks to wizd for helping with the man page.
 1.8 25-Nov-2006  christos branches: 1.8.8;
PR/34837: Mindaguas: Add SysV SHM dynamic reallocation and locking to the
physical memory
 1.7 12-Oct-2006  yamt uobj_wirepages and uobj_unwirepages from Mindaugas. PR/34771.
(commented out in files.uvm for now because there is no user in tree.)

http://mail-index.netbsd.org/tech-kern/2006/09/24/0000.html
http://mail-index.netbsd.org/tech-kern/2006/10/10/0000.html
 1.6 30-Sep-2006  yamt add ubc window hit/miss evcnts.
 1.5 15-Sep-2006  yamt branches: 1.5.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.4 21-Jan-2006  yamt branches: 1.4.6; 1.4.16; 1.4.20;
implement compat_linux mremap.
 1.3 29-Nov-2005  yamt branches: 1.3.2; 1.3.4;
merge yamt-readahead branch.
 1.2 29-Nov-2005  yamt branches: 1.2.2;
read-ahead statistics.
 1.1 27-Nov-2005  thorpej Move UVM files to files.uvm
 1.2.2.2 29-Nov-2005  yamt sync with head.
 1.2.2.1 29-Nov-2005  yamt file files.uvm was added on branch yamt-readahead on 2005-11-29 21:23:33 +0000
 1.3.4.1 01-Feb-2006  yamt sync with head.
 1.3.2.2 11-Dec-2005  christos Sync with head.
 1.3.2.1 29-Nov-2005  christos file files.uvm was added on branch ktrace-lwp on 2005-12-11 10:29:42 +0000
 1.4.20.2 12-Jan-2007  ad Sync with head.
 1.4.20.1 18-Nov-2006  ad Sync with head.
 1.4.16.5 27-Oct-2007  yamt sync with head.
 1.4.16.4 03-Sep-2007  yamt sync with head.
 1.4.16.3 30-Dec-2006  yamt sync with head.
 1.4.16.2 21-Jun-2006  yamt sync with head.
 1.4.16.1 21-Jan-2006  yamt file files.uvm was added on branch yamt-lazymbuf on 2006-06-21 15:12:39 +0000
 1.4.6.2 06-Mar-2006  yamt an experimental implementation of CLOCK-Pro.
 1.4.6.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.5.2.2 10-Dec-2006  yamt sync with head.
 1.5.2.1 22-Oct-2006  yamt sync with head
 1.8.8.1 20-Aug-2007  ad Sync with HEAD.
 1.9.12.1 13-Nov-2007  bouyer Sync with HEAD
 1.9.8.1 06-Nov-2007  matt sync with HEAD
 1.9.6.1 28-Oct-2007  joerg Sync with HEAD.
 1.10.22.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.10.20.4 11-Mar-2010  yamt sync with head
 1.10.20.3 19-Aug-2009  yamt sync with head.
 1.10.20.2 18-Jul-2009  yamt sync with head.
 1.10.20.1 04-May-2009  yamt sync with head.
 1.10.18.1 17-Jun-2008  yamt sync with head.
 1.10.16.2 17-Jan-2009  mjf Sync with HEAD.
 1.10.16.1 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.11.12.1 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.11.6.2 28-Apr-2009  skrll Sync with HEAD.
 1.11.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.11.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.12.4.2 23-Jul-2009  jym Sync with HEAD.
 1.12.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.16.2.3 15-Nov-2010  uebayasi Revert xmd(4).
 1.16.2.2 30-Oct-2010  uebayasi Implement pmap_physload_device(9) to replace xmd(4) MD backend.
Implement pmap_mmap(9) and use it from mem(4) and xmd(4).
 1.16.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.18.2.2 31-May-2011  rmind sync with head
 1.18.2.1 05-Mar-2011  rmind sync with head
 1.19.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.20.14.1 03-Dec-2017  jdolecek update from HEAD
 1.20.4.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.22.2.5 05-Feb-2017  skrll Sync with HEAD
 1.22.2.4 05-Dec-2016  skrll Sync with HEAD
 1.22.2.3 05-Oct-2016  skrll Sync with HEAD
 1.22.2.2 22-Sep-2015  skrll Sync with HEAD
 1.22.2.1 06-Jun-2015  skrll Sync with HEAD
 1.25.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.28.14.1 21-May-2018  pgoyette Sync with HEAD
 1.29.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.32.2.1 17-Jan-2020  ad Sync with head.
 1.36.10.1 06-Jun-2021  cjep sync with head
 1.36.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.78 17-Jul-2023  riastradh uvm(9): One rndsource for faults -- not one per CPU.

All relevant state is per-CPU anyway; the only substantive difference
this makes is how many entries appear in `rndctl -l' output and what
they are called -- formerly the somewhat confusing `cpuN', meaning
`page faults on cpuN', and now just `uvmfault'. I don't think
there's any real value in being able to enable or disable measurement
or counting of page faults on one CPU vs others, so although this
could be a minor compatibility change, it's hard to imagine it
matters much.

XXX kernel ABI change in struct cpu_info
 1.77 17-May-2020  ad - If the hardware provided NUMA info, then use it to decide how to set up
the allocator's buckets, instead of doing round robin distribution. There
are open questions here but this is better than doing nothing.

- Kernel reserve pages are for the kernel not realtime threads.
 1.76 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.75 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.74 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.73 31-Dec-2019  ad branches: 1.73.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.72 27-Dec-2019  ad Nothing uses uvm.cpus any more, and we can do the same with cpu_lookup(),
so get rid of it.
 1.71 27-Dec-2019  ad Redo the page allocator to perform better, especially on multi-core and
multi-socket systems. Proposed on tech-kern. While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
 1.70 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.69 01-Dec-2019  ad - Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.
 1.68 02-Jan-2017  cherry branches: 1.68.16;
Move sys/uvm/uvm_physseg.h inclusion to within _KERNEL only.
 1.67 22-Dec-2016  cherry Use uvm_physseg.h:uvm_page_physload() instead of uvm_extern.h

For this, include uvm_physseg.h in the build and include tree, make a
cosmetic modification to the prototype for uvm_page_physload().
 1.66 13-Apr-2015  riastradh branches: 1.66.2;
Limit <sys/rndsource.h> include to kernel.
 1.65 13-Apr-2015  riastradh Convert remaining MI <sys/rnd.h> stragglers. Many MD ones left.
 1.64 10-Aug-2014  tls branches: 1.64.4;
Merge tls-earlyentropy branch into HEAD.
 1.63 02-Feb-2012  tls branches: 1.63.6; 1.63.20;
Entropy-pool implementation move and cleanup.

1) Move core entropy-pool code and source/sink/sample management code
to sys/kern from sys/dev.

2) Remove use of NRND as test for presence of entropy-pool code throughout
source tree.

3) Remove use of RND_ENABLED in device drivers as microoptimization to
avoid expensive operations on disabled entropy sources; make the
rnd_add calls do this directly so all callers benefit.

4) Fix bug in recent rnd_add_data()/rnd_add_uint32() changes that might
have lead to slight entropy overestimation for some sources.

5) Add new source types for environmental sensors, power sensors, VM
system events, and skew between clocks, with a sample implementation
for each.

ok releng to go in before the branch due to the difficulty of later
pullup (widespread #ifdef removal and moved files). Tested with release
builds on amd64 and evbarm and live testing on amd64.
 1.62 17-May-2011  mrg branches: 1.62.4; 1.62.8;
move and rename the uvm history code out of uvm_stat to "kernhist".

rename "UVMHIST" option to enable the uvm histories.

TODO:
- make UVMHIST properly depend upon KERNHIST
- enable dynamic registration of histories. this is mostly just
allocating something in a bitmap, and is only for viewing multiple
histories in a merged form.


tested on amd64 and sparc64.
 1.61 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.60 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.59 09-Dec-2010  uebayasi branches: 1.59.2; 1.59.4;
Make UVM_PAGE_TRKOWN a real flag.
 1.58 25-Apr-2010  ad Reduce memory spent on bookkeeping for large values of MAXCPUS.
 1.57 21-Oct-2009  rmind branches: 1.57.2; 1.57.4;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.56 28-Jun-2009  rmind Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
 1.55 04-Jun-2008  ad branches: 1.55.12; 1.55.16;
Replace the global vm_page hash with a per vm_object rbtree.
Proposed on tech-kern@.
 1.54 04-Jun-2008  ad - vm_page: put listq, pageq into a union alongside a LIST_ENTRY, so we can
use both types of list.

- Make page coloring and idle zero state per-CPU.

- Maintain per-CPU page freelists. When freeing, put pages onto the local
CPU's lists and the global lists. When allocating, prefer to take pages
from the local CPU. If none are available take from the global list as
done now. Proposed on tech-kern@.
 1.53 02-Jan-2008  ad branches: 1.53.6; 1.53.8; 1.53.10; 1.53.12;
Merge vmlocking2 to head.
 1.52 21-Jul-2007  ad branches: 1.52.6; 1.52.12; 1.52.14; 1.52.18; 1.52.22;
Merge unobtrusive locking changes from the vmlocking branch.
 1.51 09-Jul-2007  ad branches: 1.51.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.50 15-Jun-2007  ad Add a sysctl to disable swapout of kernel stacks. Discussed on tech-kern@.
 1.49 21-Feb-2007  thorpej branches: 1.49.4; 1.49.6;
Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.48 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.47 19-Feb-2007  ad uvm_kick_scheduler(): do nothing until the swap subsystem is initialized.
 1.46 15-Feb-2007  ad branches: 1.46.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).
 1.45 21-Dec-2006  yamt merge yamt-splraiseipl branch.

- finish implementing splraiseipl (and makeiplcookie).
http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html
- complete workqueue(9) and fix its ipl problem, which is reported
to cause audio skipping.
- fix netbt (at least compilation problems) for some ports.
- fix PR/33218.
 1.44 15-Sep-2006  yamt branches: 1.44.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.43 11-Feb-2006  yamt branches: 1.43.2; 1.43.14;
remove the following options. no objections on tech-kern@.

UVM_PAGER_INLINE
UVM_AMAP_INLINE
UVM_PAGE_INLINE
UVM_MAP_INLINE
 1.42 29-Nov-2005  yamt branches: 1.42.2; 1.42.4; 1.42.6;
read-ahead statistics.
 1.41 30-Oct-2005  yamt branches: 1.41.2;
don't include uvm_*_i.h unless needed,
to reduce bogus header dependencies.
 1.40 11-May-2005  yamt branches: 1.40.2; 1.40.4;
allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.39 01-Jan-2005  yamt for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.
 1.38 23-Nov-2004  yamt introduce UVMHIST_LOANHIST and sprinkle UVMHIST_LOGs.
 1.37 10-Feb-2004  matt Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.
 1.36 29-Jan-2004  yamt - split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.
 1.35 01-Dec-2002  matt branches: 1.35.6;
Reorder things so that with multiple inclusion protection that optional
definitions are outside the protection checks.
 1.34 02-Nov-2002  perry gah. reversed a test.
 1.33 02-Nov-2002  perry /*CONTCOND*/, and protect UVMHIST_DECL with #ifdef UVMHIST
 1.32 15-Sep-2002  thorpej Protect "struct uvm" with _KERNEL.
 1.31 15-Sep-2001  chs branches: 1.31.6; 1.31.12;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.30 27-Jun-2001  thorpej branches: 1.30.2; 1.30.4;
Macro'ize the code that checks the free and inactive thresholds and
wakes the pagedaemon.
 1.29 02-Jun-2001  chs replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.28 30-May-2001  mrg use _KERNEL_OPT
 1.27 26-May-2001  chs replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.
 1.26 22-May-2001  ross Merge the swap-backed and object-backed inactive lists.
 1.25 29-Apr-2001  thorpej Implement page coloring, using a round-robin bucket selection
algorithm (Solaris calls this "Bin Hopping").

This implementation currently relies on MD code to define a
constant defining the number of buckets. This will change
reasonably soon (MD code will be able to dynamically size
the bucket array).
 1.24 27-Nov-2000  chs branches: 1.24.2;
Initial integration of the Unified Buffer Cache project.
 1.23 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.22 08-Jun-2000  thorpej Change UVM_UNLOCK_AND_WAIT() to use ltsleep() (it is now atomic, as
advertised). Garbage-collect uvm_sleep().
 1.21 24-Apr-2000  thorpej branches: 1.21.2;
Changes necessary to implement pre-zero'ing of pages in the idle loop:
- Make page free lists have two actual queues: known-zero pages and
pages with unknown contents.
- Implement uvm_pageidlezero(). This function attempts to zero up to
the target number of pages until the target has been reached (currently
target is `all free pages') or until whichqs becomes non-zero (indicating
that a process is ready to run).
- Define a new hook for the pmap module for pre-zero'ing pages. This is
used to zero the pages using uncached access. This allows us to zero
as many pages as we want without polluting the cache.

In order to use this feature, each platform must add the appropropriate
glue in their idle loop.
 1.20 10-Apr-2000  chs tidy.
 1.19 02-Apr-2000  thorpej Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.

Fix tested by Havard Eidnes.
 1.18 13-Nov-1999  thorpej Always pass all arguments to uvm_sleep().
 1.17 22-Jul-1999  thorpej branches: 1.17.2; 1.17.4; 1.17.8;
Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.16 21-Jun-1999  thorpej Protect prototypes, certain macros, and inlines from userland.
 1.15 26-Mar-1999  chs branches: 1.15.2; 1.15.4;
add uvmexp.swpgonly and use it to detect out-of-swap conditions.
 1.14 25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.13 11-Oct-1998  chuck branches: 1.13.2;
remove unused share map code from UVM:
dump UVM_ET_MAP/UVM_ET_ISMAP. if you need to detect a submap use
UVM_ET_SUBMAP/UVM_ET_ISSUBMAP.
 1.12 24-Sep-1998  thorpej NCPU > 1 -> MULTIPROCESSOR
 1.11 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.10 08-Jul-1998  thorpej branches: 1.10.2;
Add support for multiple memory free lists. There is at least one
default free list, and 0 - N additional free list, in order of descending
priority.

A new page allocation function, uvm_pagealloc_strat(), has been added,
providing three page allocation strategies:

- normal: high -> low priority free list walk, taking the
page off the first free list that has one.

- only: attempt to allocate a page only from the specified free
list, failing if that free list has none available.

- fallback: if `only' fails, fall back on `normal'.

uvm_pagealloc(...) is provided for normal use (and is a synonym for
uvm_pagealloc_strat(..., UVM_PGA_STRAT_NORMAL, 0); the free list argument
is ignored for the `normal' case).

uvm_page_physload() now specified which free list the pages will be
loaded onto. This means that some platforms which have multiple physical
memory segments may define additional vm_physsegs if they wish to break
individual physical segments into differing priorities.

Machine-dependent code must define _at least_ the following constants
in <machine/vmparam.h>:

VM_NFREELIST: the number of free lists the system will have

VM_FREELIST_DEFAULT: the default freelist (should always be 0,
but is defined in machdep code so that it's with all of the
other free list-related constants).

Additional free list names may be defined by machine-dependent code, but
they will only be used by machine-dependent code (e.g. for loading the
vm_physsegs).
 1.9 04-Jul-1998  pk Shield `#include opt_*.h'.
 1.8 20-May-1998  thorpej defopt LOCKDEBUG
 1.7 18-May-1998  pk No dummy locks if LOCKDEBUG.
 1.6 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.5 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.10.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.13.2.3 09-Apr-1999  chs add globals for aiodone daemon.
 1.13.2.2 25-Feb-1999  chs declare ubchist here instead of in all the files that use it.
 1.13.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.15.4.6 02-Aug-1999  thorpej Update from trunk.
 1.15.4.5 31-Jul-1999  chs remove duplicate uvmhist decls.
 1.15.4.4 04-Jul-1999  chs uvm.aio_done is now a TAILQ of struct buf rather than struct uvm_aiodesc.
 1.15.4.3 02-Jul-1999  thorpej Fix merge botch.
 1.15.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.15.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.15.2.1 26-Apr-2000  he Pull up revision 1.19 (requested by thorpej):
Use a more reliable method to determine if uvm_page_init() has
completed. This fixes a problem observed on some i386 configs
(typically with lots of memory) where the kernel page table needs
to grow during initialization.
 1.17.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.17.4.1 15-Nov-1999  fvdl Sync with -current
 1.17.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.17.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.21.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.24.2.8 11-Dec-2002  thorpej Sync with HEAD.
 1.24.2.7 11-Nov-2002  nathanw Catch up to -current
 1.24.2.6 17-Sep-2002  nathanw Catch up to -current.
 1.24.2.5 16-Jul-2002  nathanw pagedaemon_proc really should be a proc, not a LWP.
 1.24.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.24.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.24.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.24.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.30.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.30.2.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.30.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.31.12.1 15-Jun-2003  tron Pull up revision 1.32 (requested by cjep in ticket #1240):
Protect "struct uvm" with _KERNEL.
 1.31.6.6 12-Mar-2002  thorpej Make hashlock an adaptive mutex, and rename it to hash_mutex.
 1.31.6.5 12-Mar-2002  thorpej Make afreelock an adaptive mutex, and rename it to afree_mutex.
 1.31.6.4 12-Mar-2002  thorpej Make kentry_lock a spin mutex at IPL_VM, and rename it to kentry_mutex.
 1.31.6.3 12-Mar-2002  thorpej Make pageqlock an adaptive mutex, and rename it to pageq_mutex.
 1.31.6.2 12-Mar-2002  thorpej Convert the fpageqlock to a spin mutex at IPL_VM and rename it
to fpageq_mutex.
 1.31.6.1 11-Mar-2002  thorpej Convert swap_syscall_lock and uvm.swap_data_lock to adaptive mutexes,
and rename them apporpriately.
 1.35.6.7 11-Dec-2005  christos Sync with head.
 1.35.6.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.35.6.5 17-Jan-2005  skrll Sync with HEAD.
 1.35.6.4 29-Nov-2004  skrll Sync with HEAD.
 1.35.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.35.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.35.6.1 03-Aug-2004  skrll Sync with HEAD
 1.40.4.1 02-Nov-2005  yamt sync with head.
 1.40.2.5 21-Jan-2008  yamt sync with head
 1.40.2.4 03-Sep-2007  yamt sync with head.
 1.40.2.3 26-Feb-2007  yamt sync with head.
 1.40.2.2 30-Dec-2006  yamt sync with head.
 1.40.2.1 21-Jun-2006  yamt sync with head.
 1.41.2.1 29-Nov-2005  yamt sync with head.
 1.42.6.1 22-Apr-2006  simonb Sync with head.
 1.42.4.1 09-Sep-2006  rpaulo sync with head
 1.42.2.1 18-Feb-2006  yamt sync with head.
 1.43.14.2 12-Jan-2007  ad Sync with head.
 1.43.14.1 18-Nov-2006  ad Sync with head.
 1.43.2.2 15-Sep-2006  yamt make UVM_KICK_PDAEMON() a real function and stop including
uvm_pdpolicy.h from uvm.h. this also fixes build of pmap(1).
 1.43.2.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.44.2.1 22-Oct-2006  yamt use workqueue for aiodoned.
 1.46.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.49.6.1 11-Jul-2007  mjf Sync with head.
 1.49.4.5 21-Aug-2007  yamt fix some races around pagedaemon and uvm_wait. ok'ed by Andrew Doran.
 1.49.4.4 15-Jul-2007  ad Sync with head.
 1.49.4.3 28-Apr-2007  ad Split uvm_hashlock into an array of 32 locks.
 1.49.4.2 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.49.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.51.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.52.22.2 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.52.22.1 21-Jul-2007  ad file uvm.h was added on branch matt-mips64 on 2007-07-21 19:21:54 +0000
 1.52.18.1 02-Jan-2008  bouyer Sync with HEAD
 1.52.14.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.52.12.1 18-Feb-2008  mjf Sync with HEAD.
 1.52.6.1 09-Jan-2008  matt sync with HEAD
 1.53.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.53.10.4 11-Aug-2010  yamt sync with head.
 1.53.10.3 11-Mar-2010  yamt sync with head
 1.53.10.2 18-Jul-2009  yamt sync with head.
 1.53.10.1 04-May-2009  yamt sync with head.
 1.53.8.1 17-Jun-2008  yamt sync with head.
 1.53.6.1 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.55.16.4 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.55.16.3 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.55.16.2 03-Jun-2011  matt Restore $NetBSD$
 1.55.16.1 03-Jun-2011  matt Rework page free lists to be sorted by color first rather than free_list.
Kept per color PGFL_* counter in each page free list.
Minor cleanups.
 1.55.12.1 23-Jul-2009  jym Sync with HEAD.
 1.57.4.3 31-May-2011  rmind sync with head
 1.57.4.2 05-Mar-2011  rmind sync with head
 1.57.4.1 30-May-2010  rmind sync with head
 1.57.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.59.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.59.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.62.8.1 18-Feb-2012  mrg merge to -current.
 1.62.4.5 17-Apr-2012  yamt sync with head
 1.62.4.4 26-Dec-2011  yamt - use O->A loan to serve read(2). based on a patch from Chuck Silvers
- associated O->A loan fixes.
 1.62.4.3 20-Nov-2011  yamt - fix page loaning XXX make O->A loaning further
- add some statistics
 1.62.4.2 12-Nov-2011  yamt redo the page clean/dirty/unknown accounting separately for file and
anonymous pages
 1.62.4.1 11-Nov-2011  yamt - track the number of clean/dirty/unknown pages in the system.
- g/c PG_MARKER
 1.63.20.1 07-Apr-2014  tls Be a little more clear and consistent about harvesting entropy from devices:

1) deprecate RND_FLAG_NO_ESTIMATE

2) define RND_FLAG_COLLECT_TIME, RND_FLAG_COLLECT_VALUE

3) define RND_FLAG_ESTIMATE_TIME, RND_FLAG_ESTIMATE_VALUE

4) define RND_FLAG_DEFAULT: RND_FLAG_COLLECT_TIME|
RND_FLAG_COLLECT_VALUE|RND_FLAG_ESTIMATE_TIME

5) Make entropy harvesting from environmental sensors a little more generic
and remove it from individual sensor drivers.

6) Remove individual open-coded delta-estimators for values from a few
places in the tree (uvm, environmental drivers).

7) 0 -> RND_FLAG_DEFAULT, actually gather entropy from various drivers
that had stubbed out code, other minor cleanups.
 1.63.6.2 03-Dec-2017  jdolecek update from HEAD
 1.63.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.64.4.2 05-Feb-2017  skrll Sync with HEAD
 1.64.4.1 06-Jun-2015  skrll Sync with HEAD
 1.66.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.68.16.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.73.2.1 29-Feb-2020  ad Sync with head.
 1.129 10-Sep-2023  ad Align uvm_amap to COHERENCY_UNIT.
 1.128 19-Jun-2023  msaitoh s/value value/value/ in comment. No functional change.
 1.127 09-Apr-2023  riastradh uvm(9): KASSERT(A && B) -> KASSERT(A); KASSERT(B)
 1.126 13-Mar-2021  skrll Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats
 1.125 21-Sep-2020  chs branches: 1.125.2;
the previous fix for PR 55366 in uvm_amap.c 1.124 was incomplete:
- amap_adjref_anons() must also ignore AMAP_REFALL when updating
the ppref, not just when deciding whether or not to initialize ppref.
- UVM_EXTRACT_QREF relies on AMAP_REFALL to work properly,
and since we can't use AMAP_REFALL then we can't use QREF either.
 1.124 20-Sep-2020  chs Effectively disable the AMAP_REFALL flag because it is unsafe.
This flag tells the amap code that it does not need to allocate ppref
as part of adding or removing a reference, but that is only correct
if the range of the reference being added or removed is the same
as the range of all other references to the amap, and the point of
this flag is exactly to try to optimize the case where the range is
different and thus this flag would not be correct to use.
Fixes PR 55366.
 1.123 18-Aug-2020  chs fix amap_extend() to handle amaps where we previously failed to allocate
the ppref memory.
 1.122 09-Jul-2020  skrll Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
 1.121 08-Jul-2020  skrll Trailing whitespace
 1.120 17-May-2020  ad Mark amappl with PR_LARGECACHE.
 1.119 20-Mar-2020  ad Go back to freeing struct vm_anon one by one. There may have been an
advantage circa ~2008 but there isn't now.
 1.118 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.117 14-Mar-2020  ad - Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.
 1.116 24-Feb-2020  rin 0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.
 1.115 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.114 02-Jan-2020  ad branches: 1.114.2;
Back out the amap allocation changes from earlier today - have seen a panic
with them. Retain the lock changes.
 1.113 01-Jan-2020  ad - Start trying to reduce the high cache miss rate observed around vm_amap.
On _LP64, pad struct vm_amap to 128 bytes and use the additional space to
hold the arrays for tiny amaps which are common. Carefully size the array
allocations to avoid false sharing, and for smaller amaps try to share
allocated cache lines.

- Eliminate most contention due to amap_list: maintain the list in the pool
cache constructor / destructor like we do for struct file. Cache the
mutexes we allocate here.

- Don't do PR_WAITOK mutex allocations when NOWAIT has been specified.
 1.112 01-Jan-2020  ad PR kern/54821: 9.99.32 assertion in uvm_pageactivate

Looks like I forgot to commit this file yesterday.
 1.111 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.110 01-Dec-2019  ad Activate pages in batch instead of acquring uvm_pageqlock a zillion times.
 1.109 12-Aug-2018  maxv branches: 1.109.4;
Rename 'slotspace' -> 'slotarea' in UVM, to avoid (future) collision with
the x86 slotspace structure.
 1.108 28-Oct-2017  pgoyette branches: 1.108.2; 1.108.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.107 08-Apr-2012  chs branches: 1.107.2; 1.107.32;
initialize amap per-page reference counts before changing the amap's
overall reference count. this fixes the crashes seen for the last 9 months
with web browers and plugins, which was also the cause of PR 46193.
 1.106 30-Mar-2012  chs adjust amap_cow_now() to make UVM_PAGE_TRKOWN happy.
 1.105 27-Jan-2012  para branches: 1.105.2;
extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.104 11-Oct-2011  yamt branches: 1.104.2; 1.104.6;
assertion
 1.103 17-Aug-2011  rmind amap_cow_now: just free the fresh anon on error, no need to dispose it.
 1.102 06-Aug-2011  rmind - Rework uvm_anfree() into uvm_anon_freelst(), which always drops the lock.
- Free anons in uvm_anon_freelst() without lock held.
- Mechanic sync to unused loaning code.
 1.101 05-Jul-2011  yamt reduce the number of atomic ops in common cases. it's exceptional for
anons to remain longer than amap.
 1.100 27-Jun-2011  hannken amap_copy(): Keep the source amap locked until its lock has been copied.

Kernel assertion "anon->an_lock == amap->am_lock" no longer fails.

Ok: Mindaugas Rasiukevicius <rmind@netbsd.org>
 1.99 24-Jun-2011  rmind amap_copy: fix one more regression, thanks to enami@.
 1.98 24-Jun-2011  rmind Fix uvmplock regression - a lock against oneself case in amap_swap_off().
Happens since amap is NULL in uvmfault_anonget(), so uvmfault_unlockall()
keeps anon locked, when it should unlock it.
 1.97 24-Jun-2011  rmind amap_pp_adjref: fix regression, spotted by nonaka@.
 1.96 23-Jun-2011  rmind Clean-up, add asserts, slightly simplify.
 1.95 18-Jun-2011  rmind Clean up, sprinkle asserts, consify, use unsigned, use kmem_zalloc instead
of memset, reduce the scope of some variables, improve some comments.

No functional change intended.
 1.94 18-Jun-2011  rmind amap_add/amap_unadd: clean up slightly, use unsigned, add asserts.
 1.93 18-Jun-2011  rmind Add amap_adjref_anons() helper and simplify amap_ref()/amap_unref().
 1.92 16-Jun-2011  rmind amap_lookup{s}: add assert, clean-up slightly.
 1.91 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.90 23-Apr-2011  rmind branches: 1.90.2;
Replace "malloc" in comments, remove unnecessary header inclusions.
 1.89 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.88 21-Oct-2009  rmind branches: 1.88.4; 1.88.6; 1.88.8;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.87 16-Aug-2009  yamt assertions
 1.86 28-Mar-2009  rmind Convert some panic() checks to KASSERT()s.
This code is stable and there is no reason to enforce checks.
 1.85 03-Dec-2008  ad branches: 1.85.4;
Make adjustment of uvm_extrapages atomic since it's done without a lock.
XXX This is still a hack.
 1.84 02-Jan-2008  ad branches: 1.84.6; 1.84.10; 1.84.16; 1.84.18; 1.84.20;
Merge vmlocking2 to head.
 1.83 08-Dec-2007  ad branches: 1.83.4;
Allocate amaps from a pool_cache.
 1.82 21-Jul-2007  ad branches: 1.82.4; 1.82.6; 1.82.12; 1.82.14; 1.82.16;
Merge unobtrusive locking changes from the vmlocking branch.
 1.81 09-Jul-2007  ad branches: 1.81.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.80 12-Mar-2007  ad branches: 1.80.2; 1.80.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.79 22-Feb-2007  thorpej branches: 1.79.4;
TRUE -> true, FALSE -> false
 1.78 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.77 09-Feb-2007  ad branches: 1.77.2;
Merge newlock2 to head.
 1.76 01-Nov-2006  yamt remove some __unused from function parameters.
 1.75 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.74 25-Jun-2006  yamt branches: 1.74.4; 1.74.6;
make amap use kmem_alloc, rather than malloc.
(ie. make it use kernel_map, rather than kmem_map.)
kmem_map is more restricted than kernel_map,
and there's no point for amap to use it.
 1.73 21-Apr-2006  yamt branches: 1.73.4;
amap_splitref: assert that origref->ar_amap is initialized
by caller beforehand.
 1.72 15-Feb-2006  yamt branches: 1.72.2; 1.72.4; 1.72.6;
- amap_copy: take a "flags" argument instead of booleans.
- add AMAP_COPY_NOMERGE flag, and use it for uvm_map_extract.
PR/32806 from Julio M. Merino Vidal.
 1.71 11-Feb-2006  yamt remove the following options. no objections on tech-kern@.

UVM_PAGER_INLINE
UVM_AMAP_INLINE
UVM_PAGE_INLINE
UVM_MAP_INLINE
 1.70 21-Jan-2006  yamt branches: 1.70.2; 1.70.4;
- uvm_fault: move a common code of 1B and 2B to a new function.
don't attempt to allocate anons with kernel_map locked. PR/32543.
- amap_copy: add an assertion.
 1.69 18-Jan-2006  chs in amap_alloc(), only put the amap on the list of amaps if we succeeded
in allocating it.
 1.68 24-Dec-2005  perry branches: 1.68.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.67 11-Dec-2005  christos merge ktrace-lwp.
 1.66 06-Nov-2005  chs in amap_cow_now(), handle the case where we have to sleep and some of the
already-copied pages are paged out. anons that have already been copied
will have refcount == 1, whereas anons that still need to be copied will
have refcount > 1. fixes PR 25392, PR 30257, PR 31924.
 1.65 13-Sep-2005  yamt wrap swap related code by #ifdef VMSWAP. always #define VMSWAP for now.
 1.64 31-Jul-2005  yamt revert "defflag VMSWAP" changes for now.
there seems to be far more people who don't want to edit
their kernel config files than i thought.
 1.63 30-Jul-2005  yamt defflag VMSWAP.
 1.62 27-Jun-2005  thorpej branches: 1.62.2;
Use ANSI function decls.
 1.61 17-May-2005  yamt (try to) merge map entries in fault handler.
 1.60 11-May-2005  yamt allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.59 05-May-2005  yamt - amap_extend: don't extend amap beyond UVM_AMAP_LARGE.
- uvm_map_enter: if we fail to extend amap, just give up merging instead of
bailing out immediately.
 1.58 06-Apr-2005  yamt amap_wipeout: remove a comment which is no longer true.
despite of what comment said, i left preempt() call
because i don't think of any bad effects.
 1.57 30-Jan-2005  chs branches: 1.57.4;
hack around a UVM problem that causes hangs when large processes fork.
see PR 26908 for details.
 1.56 01-Jan-2005  yamt branches: 1.56.2; 1.56.4;
for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.
 1.55 12-May-2004  yamt add assertions.
 1.54 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.53 24-Mar-2004  junyoung branches: 1.53.2; 1.53.4;
- Nuke __P().
- Drop trailing spaces.
 1.52 01-Feb-2003  thorpej branches: 1.52.2;
Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.51 27-Jan-2003  pk amap_copy: remove stray amap_unlock().
 1.50 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.49 20-Dec-2002  atatat Properly set page references counts at the start of the newly
allocated ppref data to zero in the case of an amap that has empty
space at the front.

Don't set anything in the ppref array if "len" is zero.

Many thanks to Sami Kantoluoto for providing gdb access to a machine
that would reliably crash with problems related to the above, and to
Stephan Thesing for corroborating that the patch properly addressed
the problem.

Note that the ar_pageoff (and related variables) types must be changed
soon. The use of "int" here is not theoretically sufficient.
 1.48 30-Nov-2002  bouyer Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.
 1.47 15-Nov-2002  atatat Properly free "newppref", instead of "amap->am_ppref" (oops), and
delay freeing the old am_ppref so that if we bail early due to
malloc() failures, valid ppref data hasn't been freed for no reason.

Based on comments from enami.
 1.46 14-Nov-2002  atatat Implement backwards extension of amaps. There are three cases to deal
with:

Case #1 -- adjust offset: The slot offset in the aref can be
decremented to cover the required size addition.

Case #2 -- move pages and adjust offset: The slot offset is not large
enough, but the amap contains enough inactive space *after* the mapped
pages to make up the difference, so active slots are slid to the "end"
of the amap, and the slot offset is, again, adjusted to cover the
required size addition. This optimizes for hitting case #1 again on
the next small extension.

Case #3 -- reallocate, move pages, and adjust offset: There is not
enough inactive space in the amap, so the arrays are reallocated, and
the active pages are copied again to the "end" of the amap, and the
slot offset is adjusted to cover the required size. This also
optimizes for hitting case #1 on the next backwards extension.

This provides the missing piece in the "forward extension of
vm_map_entries" logic, so the merge failure counters have been
removed.

Not many applications will make any use of this at this time (except
for jvms and perhaps gcc3), but a "top-down" memory allocator will use
it extensively.
 1.45 15-Sep-2002  chs add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.
 1.44 29-Jun-2002  chs rearrange a few lines to appease an assertion.
 1.43 28-Mar-2002  nathanw branches: 1.43.2; 1.43.4;
In amap_pp_adjref(), avoid incorrectly merging the first two chunks in
a ppref array when the range being adjusted includes the beginning of
the array.
 1.42 08-Mar-2002  thorpej branches: 1.42.2;
Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.41 25-Feb-2002  chs in amap_pp_adjref(), avoid unnecessary fragmentation of the am_ppref array
by merging the first changed chunk with the last unchanged chunk if possible.
 1.40 05-Dec-2001  enami When initially allocating or extending arrays in struct uvm_amap,
adjust allocation size using malloc_roundup(). This eliminates many
unnecessary malloc/memcpy calls.
 1.39 05-Dec-2001  enami No need to zero clear after amap->am_bckptr[amap->am_nslot], since we're
clearing corresponding elements in an array amap->am_anon[].
 1.38 01-Dec-2001  chuck fix bug in amap_wiperange() detected by enami tsugutomo.
loop control was wrong in one case.
 1.37 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.36 06-Nov-2001  simonb Change some unsigned int variables and parameters to plain ints so
that all usages of those agree on unsigned vs. signed.
 1.35 19-Sep-2001  chs branches: 1.35.2;
work around swap-space/extent performance problem which causes
long pauses when processes with lots of swapped-out pages exit.
 1.34 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.33 22-Jul-2001  wiz branches: 1.33.2;
seperate -> separate
 1.32 02-Jun-2001  chs branches: 1.32.2;
replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.31 25-May-2001  chs remove trailing whitespace.
 1.30 18-Feb-2001  chs branches: 1.30.2;
clean up DIAGNOSTIC checks, use KASSERT().
 1.29 23-Jan-2001  thorpej Change uvm_analloc() to return a locked anon, update all callers,
and fix an anon locking protocol error in uvm_loanzero().
 1.28 23-Jan-2001  thorpej Sprinkle some assertions:
amap_free(): Assert that the amap is locked.
amap_share_protect(): Assert that the amap is locked.
amap_wipeout(): Assert that the amap is locked.
uvm_anfree(): Assert that the anon has a reference count of 0 and is
not locked.
uvm_anon_lockloanpg(): Assert that the anon is locked.
anon_pagein(): Assert that the anon is locked.
uvmfault_anonget(): Assert that the anon is locked.
uvm_pagealloc_strat(): Assert that the uobj or the anon is locked

And fix the problems these have uncovered:
amap_cow_now(): Lock the new anon after allocating it, and unref and
unlock it (rather than lock!) before freeing it in case
of an error condition. This should fix a problem reported
by Dan Carosone using cdrecord on an i386 MP kernel.
uvm_fault(): Case1B -- Lock the new anon afer allocating it, and unlock
it later when we unlock the old anon.
Case2 -- Lock the new anon after allocating it, and unlock
it later by passing it to uvmfault_unlockall() (we set anon
to NULL if we're not doing a promote fault).
 1.27 25-Nov-2000  chs lots of cleanup:
use queue.h macros and KASSERT().
address amap offsets in pages instead of bytes.
make amap_ref() and amap_unref() take an amap, offset and length
instead of a vm_map_entry_t.
improve whitespace and comments.
 1.26 03-Aug-2000  thorpej MALLOC()/FREE() are not to be used for variable size allocations.
 1.25 02-Aug-2000  thorpej Fix a fairly obvious locking error in amap_cow_now() -- the amap was
left locked upon exit from the function (how did this one slip for
so long?)
 1.24 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.23 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.22 12-Sep-1999  chs branches: 1.22.2; 1.22.12;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.
 1.21 06-Jul-1999  cgd fix allocation handling bugs in amap_alloc1(). if the first or second
sub-structure malloc() failed, it was quite likely that the function
would return success incorrectly. This is this direct cause of the bug
reported in PR#7897. (Thanks to chs for helping to track it down.)
 1.20 11-Apr-1999  chs add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.19 28-Jan-1999  chuck branches: 1.19.2;
comment cleanup, shift around the inline stuff a bit,
rename VM_AMAP_PPREF (to UVM_AMAP_PPREF).
 1.18 24-Jan-1999  chuck cleanup/reorg:
- break anon related functions out of uvm_amap.c and put them in their own
file (uvm_anon.c). includes break up uvm_anon_init into an amap and an
an anon init function
- ensure that only functions within the amap module access amap structure
fields (add macros to amap api as needed)
 1.17 04-Nov-1998  chs branches: 1.17.2;
be consistent with locking of amaps and anons when freeing them.
 1.16 18-Oct-1998  chs shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.
 1.15 08-Oct-1998  chuck fix ppref botch. establish ppref at split time before we add the duplicate
reference.
 1.14 31-Aug-1998  thorpej Allocate vm_anon arrays from kernel_map, not via MALLOC(). Helps relieve
much of UVM's kmem_map usage.
 1.13 29-Aug-1998  thorpej Use the pool allocator (and the "nointr" pool page allocator) for
vm_amap structures.
 1.12 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.11 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.10 20-Jun-1998  mrg branches: 1.10.2;
add a "<-done!" log
 1.9 14-May-1998  chuck detect ending VA wrap-around in the chunking code of amap_copy.

fixes problem reported by Ken Nakata <kenn@synap.ne.jp> on the mac68k
where the stack amap chunking caused entry->end to wrap around to zero,
thus corrupting the map entry list and causing kmem_map to fill.
 1.8 05-May-1998  kleink Remove inclusions of syscall (and syscall argument) related header files;
we don't need them here.
 1.7 09-Mar-1998  mrg KNF.
 1.6 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.5 08-Feb-1998  mrg KNF
 1.4 07-Feb-1998  mrg restore rcsids
 1.3 07-Feb-1998  chs fix typoes in locking.
use M_UVMAMAP instead of M_TEMP for malloc type.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.10.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.17.2.2 25-Feb-1999  chs thread_wakeup() -> wakeup().
 1.17.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.19.2.2 07-Jul-1999  perry pullup 1.20->1.21 (cgd)
 1.19.2.1 16-Apr-1999  chs branches: 1.19.2.1.2; 1.19.2.1.4;
pull up 1.19 -> 1.20:
add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.19.2.1.4.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.19.2.1.2.2 02-Aug-1999  thorpej Update from trunk.
 1.19.2.1.2.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.22.12.1 02-Aug-2000  thorpej Pull up rev. 1.25 (approved by jhawk):
Fix a fairly obvious locking error in amap_cow_now() -- the amap was
left locked upon exit from the function (how did this one slip for
so long?)
 1.22.2.4 12-Mar-2001  bouyer Sync with HEAD.
 1.22.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.22.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.22.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.30.2.14 07-Jan-2003  thorpej In the SA universe, the switch-to-this-LWP decision is made at a
different level than where preempt() calls are made, which renders
the "newlwp" argument useless. Replace it with a "more work to do"
boolean argument. Returning to userspace preempt() calls pass 0.
"Voluntary" preemptions in e.g. uiomove() pass 1. This will be used
to indicate to the SA subsystem that the LWP is not yet finished in
the kernel.

Collapse the SA vs. non-SA cases of preempt() together, making the
conditional code block much smaller, and don't call sa_preempt() if
more work is to come.

NOTE: THIS IS NOT A COMPLETE FIX TO THE preempt()-in-uiomove() PROBLEM
THAT CURRENTLY EXISTS FOR SA PROCESSES.
 1.30.2.13 20-Dec-2002  thorpej Sync with HEAD.
 1.30.2.12 11-Dec-2002  thorpej Sync with HEAD.
 1.30.2.11 17-Sep-2002  nathanw Catch up to -current.
 1.30.2.10 01-Aug-2002  nathanw Catch up to -current.
 1.30.2.9 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.30.2.8 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.30.2.7 28-Feb-2002  nathanw Catch up to -current.
 1.30.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.30.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.30.2.4 25-Sep-2001  nathanw curproc->p_cpu ==> curproc->l_cpu
 1.30.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.30.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.30.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.32.2.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.32.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.32.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.32.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.32.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.32.2.1 03-Aug-2001  lukem update to -current
 1.33.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.35.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.42.2.1 12-Mar-2002  thorpej Make the amap lock an adaptive mutex.
 1.43.4.1 02-Jun-2003  tron Pull up revision 1.45 (requested by skrll):
add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.
 1.43.2.1 15-Jul-2002  gehenna catch up with -current.
 1.52.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.52.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.52.2.4 17-Jan-2005  skrll Sync with HEAD.
 1.52.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.52.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.52.2.1 03-Aug-2004  skrll Sync with HEAD
 1.53.4.2 27-Jan-2006  tron Pull up following revision(s) (requested by chs in ticket #10230):
sys/uvm/uvm_amap.c: revision 1.66 via patch
in amap_cow_now(), handle the case where we have to sleep and some of the
already-copied pages are paged out. anons that have already been copied
will have refcount == 1, whereas anons that still need to be copied will
have refcount > 1. fixes PR 25392, PR 30257, PR 31924.
 1.53.4.1 16-Mar-2005  tron branches: 1.53.4.1.2;
Pull up revision 1.57 (requested by chs in ticket #1137):
hack around a UVM problem that causes hangs when large processes fork.
see PR 26908 for details.
 1.53.4.1.2.1 27-Jan-2006  tron Pull up following revision(s) (requested by chs in ticket #10230):
sys/uvm/uvm_amap.c: revision 1.66 via patch
in amap_cow_now(), handle the case where we have to sleep and some of the
already-copied pages are paged out. anons that have already been copied
will have refcount == 1, whereas anons that still need to be copied will
have refcount > 1. fixes PR 25392, PR 30257, PR 31924.
 1.53.2.2 27-Jan-2006  tron Pull up following revision(s) (requested by chs in ticket #10230):
sys/uvm/uvm_amap.c: revision 1.66 via patch
in amap_cow_now(), handle the case where we have to sleep and some of the
already-copied pages are paged out. anons that have already been copied
will have refcount == 1, whereas anons that still need to be copied will
have refcount > 1. fixes PR 25392, PR 30257, PR 31924.
 1.53.2.1 16-Mar-2005  tron Pull up revision 1.57 (requested by chs in ticket #1137):
hack around a UVM problem that causes hangs when large processes fork.
see PR 26908 for details.
 1.56.4.1 12-Feb-2005  yamt sync with head.
 1.56.2.1 29-Apr-2005  kent sync with -current
 1.57.4.1 07-Nov-2005  jmc Pullup via patch (requested in ticket #939 by chs)

In amap_cow_now(), handle the case where we have to sleep and some of the
already-copied pages are paged out. anons that have already been copied
will have refcount == 1, whereas anons that still need to be copied will
have refcount > 1. PR#25392, PR#30257, PR#31924
 1.62.2.5 21-Jan-2008  yamt sync with head
 1.62.2.4 03-Sep-2007  yamt sync with head.
 1.62.2.3 26-Feb-2007  yamt sync with head.
 1.62.2.2 30-Dec-2006  yamt sync with head.
 1.62.2.1 21-Jun-2006  yamt sync with head.
 1.68.2.2 18-Feb-2006  yamt sync with head.
 1.68.2.1 01-Feb-2006  yamt sync with head.
 1.70.4.1 22-Apr-2006  simonb Sync with head.
 1.70.2.1 09-Sep-2006  rpaulo sync with head
 1.72.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.72.4.1 11-May-2006  elad sync with head
 1.72.2.2 26-Jun-2006  yamt sync with head.
 1.72.2.1 24-May-2006  yamt sync with head.
 1.73.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.74.6.2 10-Dec-2006  yamt sync with head.
 1.74.6.1 22-Oct-2006  yamt sync with head
 1.74.4.2 30-Jan-2007  ad Remove support for SA. Ok core@.
 1.74.4.1 18-Nov-2006  ad Sync with head.
 1.77.2.2 24-Mar-2007  yamt sync with head.
 1.77.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.79.4.4 01-Sep-2007  ad Use pool_cache for allocating a few more types of objects.
 1.79.4.3 05-Apr-2007  ad - Put a per-LWP lock around swapin / swapout.
- Replace use of lockmgr().
- Minor locking fixes and assertions.
- uvm_map.h no longer pulls in proc.h, etc.
- Use kpause where appropriate.
 1.79.4.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.79.4.1 13-Mar-2007  ad Sync with head.
 1.80.4.1 09-Dec-2007  reinoud Pullup to HEAD
 1.80.2.1 11-Jul-2007  mjf Sync with head.
 1.81.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.82.16.2 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.82.16.1 21-Jul-2007  ad file uvm_amap.c was added on branch matt-mips64 on 2007-07-21 19:21:54 +0000
 1.82.14.2 08-Dec-2007  ad Sync with head.
 1.82.14.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.82.12.1 18-Feb-2008  mjf Sync with HEAD.
 1.82.6.1 09-Jan-2008  matt sync with HEAD
 1.82.4.1 09-Dec-2007  jmcneill Sync with HEAD.
 1.83.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.84.20.1 02-Feb-2009  snj branches: 1.84.20.1.4;
Apply patch (requested by ad in ticket #357):
Make adjustment of some critical variables atomic.
 1.84.20.1.4.2 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.84.20.1.4.1 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.84.18.2 28-Apr-2009  skrll Sync with HEAD.
 1.84.18.1 19-Jan-2009  skrll Sync with HEAD.
 1.84.16.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.84.10.3 11-Mar-2010  yamt sync with head
 1.84.10.2 19-Aug-2009  yamt sync with head.
 1.84.10.1 04-May-2009  yamt sync with head.
 1.84.6.1 17-Jan-2009  mjf Sync with HEAD.
 1.85.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.88.8.1 08-Feb-2011  bouyer Sync with HEAD
 1.88.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.88.4.3 31-May-2011  rmind sync with head
 1.88.4.2 05-Mar-2011  rmind sync with head
 1.88.4.1 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.90.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.104.6.3 29-Apr-2012  mrg sync to latest -current.
 1.104.6.2 05-Apr-2012  mrg sync to latest -current.
 1.104.6.1 18-Feb-2012  mrg merge to -current.
 1.104.2.2 17-Apr-2012  yamt sync with head
 1.104.2.1 26-Dec-2011  yamt - use O->A loan to serve read(2). based on a patch from Chuck Silvers
- associated O->A loan fixes.
 1.105.2.2 09-Apr-2012  riz Pull up following revision(s) (requested by chs in ticket #173):
sys/uvm/uvm_amap.c: revision 1.107
initialize amap per-page reference counts before changing the amap's
overall reference count. this fixes the crashes seen for the last 9 months
with web browers and plugins, which was also the cause of PR 46193.
 1.105.2.1 03-Apr-2012  riz Pull up following revision(s) (requested by chs in ticket #150):
sys/uvm/uvm_amap.c: revision 1.106
adjust amap_cow_now() to make UVM_PAGE_TRKOWN happy.
 1.107.32.2 19-Aug-2020  martin Pull up following revision(s) (requested by chs in ticket #1598):

sys/uvm/uvm_amap.c: revision 1.123 (via patch)

fix amap_extend() to handle amaps where we previously failed to allocate
the ppref memory.
 1.107.32.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.107.2.1 03-Dec-2017  jdolecek update from HEAD
 1.108.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.108.4.1 10-Jun-2019  christos Sync with HEAD
 1.108.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.109.4.2 04-Oct-2020  martin Pull up following revision(s) (requested by chs in ticket #1095):

sys/uvm/uvm_amap.c: revision 1.124 (via patch)
sys/uvm/uvm_amap.c: revision 1.125 (via patch)
sys/uvm/uvm_io.c: revision 1.29 (via patch)

Effectively disable the AMAP_REFALL flag because it is unsafe.

This flag tells the amap code that it does not need to allocate ppref
as part of adding or removing a reference, but that is only correct
if the range of the reference being added or removed is the same
as the range of all other references to the amap, and the point of
this flag is exactly to try to optimize the case where the range is
different and thus this flag would not be correct to use.
Fixes PR 55366.

The previous fix for PR 55366 in uvm_amap.c 1.124 was incomplete:
- amap_adjref_anons() must also ignore AMAP_REFALL when updating
the ppref, not just when deciding whether or not to initialize ppref.
- UVM_EXTRACT_QREF relies on AMAP_REFALL to work properly,
and since we can't use AMAP_REFALL then we can't use QREF either.
 1.109.4.1 19-Aug-2020  martin Pull up following revision(s) (requested by chs in ticket #1057):

sys/uvm/uvm_amap.c: revision 1.123 (via patch)

fix amap_extend() to handle amaps where we previously failed to allocate
the ppref memory.
 1.114.2.1 29-Feb-2020  ad Sync with head.
 1.125.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.41 20-Mar-2020  ad Go back to freeing struct vm_anon one by one. There may have been an
advantage circa ~2008 but there isn't now.
 1.40 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.39 02-Jan-2020  ad branches: 1.39.2;
Back out the amap allocation changes from earlier today - have seen a panic
with them. Retain the lock changes.
 1.38 01-Jan-2020  ad - Start trying to reduce the high cache miss rate observed around vm_amap.
On _LP64, pad struct vm_amap to 128 bytes and use the additional space to
hold the arrays for tiny amaps which are common. Carefully size the array
allocations to avoid false sharing, and for smaller amaps try to share
allocated cache lines.

- Eliminate most contention due to amap_list: maintain the list in the pool
cache constructor / destructor like we do for struct file. Cache the
mutexes we allocate here.

- Don't do PR_WAITOK mutex allocations when NOWAIT has been specified.
 1.37 12-Jun-2011  rmind branches: 1.37.2; 1.37.54;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.36 23-Apr-2011  rmind branches: 1.36.2;
Replace "malloc" in comments, remove unnecessary header inclusions.
 1.35 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.34 26-Oct-2008  bjs branches: 1.34.16; 1.34.22; 1.34.24;
"sparce" -> "sparse" + commas after "large", prior to "sparse"
 1.33 21-Jul-2007  ad branches: 1.33.26; 1.33.30; 1.33.36; 1.33.38;
Merge unobtrusive locking changes from the vmlocking branch.
 1.32 22-Feb-2007  matt branches: 1.32.4; 1.32.12;
Fix lossage from boolean_t -> bool and updated x86 bus_dma.
 1.31 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.30 25-Jun-2006  yamt branches: 1.30.10;
make amap use kmem_alloc, rather than malloc.
(ie. make it use kernel_map, rather than kmem_map.)
kmem_map is more restricted than kernel_map,
and there's no point for amap to use it.
 1.29 15-Feb-2006  yamt branches: 1.29.2; 1.29.10;
- amap_copy: take a "flags" argument instead of booleans.
- add AMAP_COPY_NOMERGE flag, and use it for uvm_map_extract.
PR/32806 from Julio M. Merino Vidal.
 1.28 11-Feb-2006  yamt remove the following options. no objections on tech-kern@.

UVM_PAGER_INLINE
UVM_AMAP_INLINE
UVM_PAGE_INLINE
UVM_MAP_INLINE
 1.27 24-Dec-2005  perry branches: 1.27.2; 1.27.4; 1.27.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.26 11-Dec-2005  christos merge ktrace-lwp.
 1.25 11-May-2005  yamt branches: 1.25.2;
allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.24 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.23 24-Mar-2004  junyoung Nuke __P().
 1.22 01-Feb-2003  thorpej branches: 1.22.2;
Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.21 20-Dec-2002  atatat Properly set page references counts at the start of the newly
allocated ppref data to zero in the case of an amap that has empty
space at the front.

Don't set anything in the ppref array if "len" is zero.

Many thanks to Sami Kantoluoto for providing gdb access to a machine
that would reliably crash with problems related to the above, and to
Stephan Thesing for corroborating that the patch properly addressed
the problem.

Note that the ar_pageoff (and related variables) types must be changed
soon. The use of "int" here is not theoretically sufficient.
 1.20 30-Nov-2002  bouyer Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.
 1.19 14-Nov-2002  atatat Implement backwards extension of amaps. There are three cases to deal
with:

Case #1 -- adjust offset: The slot offset in the aref can be
decremented to cover the required size addition.

Case #2 -- move pages and adjust offset: The slot offset is not large
enough, but the amap contains enough inactive space *after* the mapped
pages to make up the difference, so active slots are slid to the "end"
of the amap, and the slot offset is, again, adjusted to cover the
required size addition. This optimizes for hitting case #1 again on
the next small extension.

Case #3 -- reallocate, move pages, and adjust offset: There is not
enough inactive space in the amap, so the arrays are reallocated, and
the active pages are copied again to the "end" of the amap, and the
slot offset is adjusted to cover the required size. This also
optimizes for hitting case #1 on the next backwards extension.

This provides the missing piece in the "forward extension of
vm_map_entries" logic, so the merge failure counters have been
removed.

Not many applications will make any use of this at this time (except
for jvms and perhaps gcc3), but a "top-down" memory allocator will use
it extensively.
 1.18 15-Sep-2002  chs add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.
 1.17 02-Jun-2001  chs branches: 1.17.2; 1.17.10; 1.17.16;
replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.16 26-May-2001  chs replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.
 1.15 25-May-2001  chs remove trailing whitespace.
 1.14 18-Feb-2001  chs branches: 1.14.2;
clean up DIAGNOSTIC checks, use KASSERT().
 1.13 25-Nov-2000  chs lots of cleanup:
use queue.h macros and KASSERT().
address amap offsets in pages instead of bytes.
make amap_ref() and amap_unref() take an amap, offset and length
instead of a vm_map_entry_t.
improve whitespace and comments.
 1.12 07-Jul-1999  thorpej branches: 1.12.2;
Don't bother returning the "slot" number from amap_add():
* Nothing currently uses this return value.
* It's arguably an abstraction violation.

Fix amap_unadd()'s API to be consistent w/ amap_add()'s: rather than
take a vm_amap * and a slot number, take a vm_aref * and an offset.

It's now actually possible to use amap_unadd() to remove an anon from
an amap.
 1.11 21-Jun-1999  thorpej Protect prototypes, certain macros, and inlines from userland.
 1.10 28-Jan-1999  chuck branches: 1.10.4;
comment cleanup, shift around the inline stuff a bit,
rename VM_AMAP_PPREF (to UVM_AMAP_PPREF).
 1.9 24-Jan-1999  chuck cleanup/reorg:
- break anon related functions out of uvm_amap.c and put them in their own
file (uvm_anon.c). includes break up uvm_anon_init into an amap and an
an anon init function
- ensure that only functions within the amap module access amap structure
fields (add macros to amap api as needed)
 1.8 18-Oct-1998  chs branches: 1.8.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.
 1.7 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.6 09-Mar-1998  mrg branches: 1.6.2;
KNF.
 1.5 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.6.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.8.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.10.4.3 02-Aug-1999  thorpej Update from trunk.
 1.10.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.10.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.12.2.2 12-Mar-2001  bouyer Sync with HEAD.
 1.12.2.1 08-Dec-2000  bouyer Sync with HEAD.
 1.14.2.4 20-Dec-2002  thorpej Sync with HEAD.
 1.14.2.3 11-Dec-2002  thorpej Sync with HEAD.
 1.14.2.2 17-Sep-2002  nathanw Catch up to -current.
 1.14.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.17.16.1 02-Jun-2003  tron Pull up revision 1.18 (requested by skrll):
add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.
 1.17.10.1 12-Mar-2002  thorpej Make the amap lock an adaptive mutex.
 1.17.2.1 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.22.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.22.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.22.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.22.2.1 03-Aug-2004  skrll Sync with HEAD
 1.25.2.4 03-Sep-2007  yamt sync with head.
 1.25.2.3 26-Feb-2007  yamt sync with head.
 1.25.2.2 30-Dec-2006  yamt sync with head.
 1.25.2.1 21-Jun-2006  yamt sync with head.
 1.27.6.1 22-Apr-2006  simonb Sync with head.
 1.27.4.1 09-Sep-2006  rpaulo sync with head
 1.27.2.1 18-Feb-2006  yamt sync with head.
 1.29.10.1 13-Jul-2006  gdamore Merge from HEAD.
 1.29.2.1 26-Jun-2006  yamt sync with head.
 1.30.10.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.32.12.1 15-Aug-2007  skrll Sync with HEAD.
 1.32.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.33.38.2 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.33.38.1 21-Jul-2007  ad file uvm_amap.h was added on branch matt-mips64 on 2007-07-21 19:21:54 +0000
 1.33.36.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.33.30.1 04-May-2009  yamt sync with head.
 1.33.26.1 17-Jan-2009  mjf Sync with HEAD.
 1.34.24.1 08-Feb-2011  bouyer Sync with HEAD
 1.34.22.1 06-Jun-2011  jruoho Sync with HEAD.
 1.34.16.3 31-May-2011  rmind sync with head
 1.34.16.2 05-Mar-2011  rmind sync with head
 1.34.16.1 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.36.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.37.54.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.37.2.1 26-Dec-2011  yamt - use O->A loan to serve read(2). based on a patch from Chuck Silvers
- associated O->A loan fixes.
 1.39.2.1 29-Feb-2020  ad Sync with head.
 1.25 11-Feb-2006  yamt remove the following options. no objections on tech-kern@.

UVM_PAGER_INLINE
UVM_AMAP_INLINE
UVM_PAGE_INLINE
UVM_MAP_INLINE
 1.24 11-Dec-2005  christos branches: 1.24.2; 1.24.4; 1.24.6;
merge ktrace-lwp.
 1.23 27-Jun-2005  thorpej branches: 1.23.2;
Use ANSI function decls.
 1.22 11-May-2005  yamt allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.21 28-Feb-2005  chs add some locking assertions.
 1.20 20-Dec-2002  atatat branches: 1.20.2; 1.20.10; 1.20.12;
Properly set page references counts at the start of the newly
allocated ppref data to zero in the case of an amap that has empty
space at the front.

Don't set anything in the ppref array if "len" is zero.

Many thanks to Sami Kantoluoto for providing gdb access to a machine
that would reliably crash with problems related to the above, and to
Stephan Thesing for corroborating that the patch properly addressed
the problem.

Note that the ar_pageoff (and related variables) types must be changed
soon. The use of "int" here is not theoretically sufficient.
 1.19 01-Dec-2002  matt Reorder things so that with multiple inclusion protection that optional
definitions are outside the protection checks.
 1.18 22-Aug-2002  matt In amap_ref, only increment the amap's refcnt after we have established
the ppref array. Otherwise, the newly ref'ed pages will be doubly
counted and thus never freed because the pprefcnt can't fall to 0.
 1.17 25-May-2001  chs branches: 1.17.2; 1.17.14; 1.17.16;
remove trailing whitespace.
 1.16 06-May-2001  thorpej Remove a comment which is no longer true. From Artur Grabowski.
 1.15 25-Nov-2000  chs branches: 1.15.2;
lots of cleanup:
use queue.h macros and KASSERT().
address amap offsets in pages instead of bytes.
make amap_ref() and amap_unref() take an amap, offset and length
instead of a vm_map_entry_t.
improve whitespace and comments.
 1.14 12-Sep-1999  chs branches: 1.14.2; 1.14.12;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.
 1.13 07-Jul-1999  thorpej Don't bother returning the "slot" number from amap_add():
* Nothing currently uses this return value.
* It's arguably an abstraction violation.

Fix amap_unadd()'s API to be consistent w/ amap_add()'s: rather than
take a vm_amap * and a slot number, take a vm_aref * and an offset.

It's now actually possible to use amap_unadd() to remove an anon from
an amap.
 1.12 25-Mar-1999  mrg branches: 1.12.4;
remove now >1 year old pre-release message.
 1.11 28-Jan-1999  chuck comment cleanup, shift around the inline stuff a bit,
rename VM_AMAP_PPREF (to UVM_AMAP_PPREF).
 1.10 24-Jan-1999  chuck cleanup/reorg:
- break anon related functions out of uvm_amap.c and put them in their own
file (uvm_anon.c). includes break up uvm_anon_init into an amap and an
an anon init function
- ensure that only functions within the amap module access amap structure
fields (add macros to amap api as needed)
 1.9 18-Oct-1998  chs branches: 1.9.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.
 1.8 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.7 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.6 10-Feb-1998  mrg branches: 1.6.2;
- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.5 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.4 08-Feb-1998  mrg KNF
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.6.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.9.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.12.4.2 02-Aug-1999  thorpej Update from trunk.
 1.12.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.14.12.1 05-Sep-2002  itojun pullup mistake - should have patched uvm_amap_i.h, not uvm_map_i.h
 1.14.2.1 08-Dec-2000  bouyer Sync with HEAD.
 1.15.2.4 20-Dec-2002  thorpej Sync with HEAD.
 1.15.2.3 11-Dec-2002  thorpej Sync with HEAD.
 1.15.2.2 27-Aug-2002  nathanw Catch up to -current.
 1.15.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.17.16.1 24-Aug-2002  lukem Pull up revision 1.18 (requested by matt in ticket #719):
In amap_ref, only increment the amap's refcnt after we have established
the ppref array. Otherwise, the newly ref'ed pages will be doubly
counted and thus never freed because the pprefcnt can't fall to 0.
 1.17.14.1 29-Aug-2002  gehenna catch up with -current.
 1.17.2.1 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.20.12.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.20.10.1 29-Apr-2005  kent sync with -current
 1.20.2.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.20.2.1 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.23.2.1 21-Jun-2006  yamt sync with head.
 1.24.6.1 22-Apr-2006  simonb Sync with head.
 1.24.4.1 09-Sep-2006  rpaulo sync with head
 1.24.2.1 18-Feb-2006  yamt sync with head.
 1.80 25-Oct-2020  chs Handle PG_PAGEOUT in uvm_anon_release() too.
 1.79 09-Jul-2020  skrll Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
 1.78 08-Jul-2020  skrll Trailing whitespace
 1.77 22-Mar-2020  ad Process concurrent page faults on individual uvm_objects / vm_amaps in
parallel, where the relevant pages are already in-core. Proposed on
tech-kern.

Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until
adjustments are made to their pmaps.
 1.76 20-Mar-2020  ad Go back to freeing struct vm_anon one by one. There may have been an
advantage circa ~2008 but there isn't now.
 1.75 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.74 24-Feb-2020  rin 0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.
 1.73 23-Feb-2020  ad Use rw_lock_op().
 1.72 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.71 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.70 31-Dec-2019  ad branches: 1.70.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.69 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.68 02-Dec-2019  chs fix the build for when UVMHIST is enabled.
 1.67 01-Dec-2019  uwe Add missing #include <sys/atomic.h>
 1.66 01-Dec-2019  ad Free pages in batch instead of taking uvm_pageqlock for each one.
 1.65 01-Dec-2019  ad - Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.
 1.64 28-Oct-2017  pgoyette branches: 1.64.4; 1.64.8;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.63 25-Oct-2013  martin branches: 1.63.22;
Mark a diagnostic-only variable
 1.62 18-Aug-2011  yamt branches: 1.62.2; 1.62.12; 1.62.16;
uvm_anon_release:
- don't forget to call uvm_anon_dispose.
- simplify code a little.
 1.61 18-Aug-2011  yamt uvm_anon_freelst:
- clear an_link/an_ref when deferring anon disposal. otherwise others can
see bogus an_ref.
- fix the code to remove anon from the list.
 1.60 14-Aug-2011  rmind uvm_anon_freelst: do not free PG_RELEASED pages (change uvm_anon_dispose()
to indicate them with a return value).
 1.59 06-Aug-2011  rmind - Rework uvm_anfree() into uvm_anon_freelst(), which always drops the lock.
- Free anons in uvm_anon_freelst() without lock held.
- Mechanic sync to unused loaning code.
 1.58 05-Jul-2011  yamt reduce the number of atomic ops in common cases. it's exceptional for
anons to remain longer than amap.
 1.57 24-Jun-2011  rmind Fix uvmplock regression - a lock against oneself case in amap_swap_off().
Happens since amap is NULL in uvmfault_anonget(), so uvmfault_unlockall()
keeps anon locked, when it should unlock it.
 1.56 24-Jun-2011  yamt uvm_anon_release: fix a locking error after the rmind-uvmplock merge
 1.55 17-Jun-2011  rmind Improve comments on uvm_anon.c, tidy up slightly.
No functional changes.
 1.54 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.53 23-Apr-2011  rmind branches: 1.53.2;
Replace "malloc" in comments, remove unnecessary header inclusions.
 1.52 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.51 18-Jan-2008  yamt branches: 1.51.28; 1.51.32; 1.51.38; 1.51.40;
push pmap_clear_reference calls into pdpolicy code, where reference bits
actually matter.
 1.50 02-Jan-2008  ad Merge vmlocking2 to head.
 1.49 20-Dec-2007  ad Specify PR_LARGECACHE for anon_cache (which is insanely busy).
 1.48 13-Nov-2007  yamt branches: 1.48.2; 1.48.6;
g/c unused uvm_anon_pool.
 1.47 07-Nov-2007  ad Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.46 11-Oct-2007  ad branches: 1.46.2; 1.46.4;
Remove LOCK_ASSERT(!simple_lock_held(&foo));
 1.45 21-Jul-2007  ad branches: 1.45.4; 1.45.6; 1.45.8; 1.45.10;
Merge unobtrusive locking changes from the vmlocking branch.
 1.44 12-Mar-2007  ad branches: 1.44.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.43 22-Feb-2007  thorpej branches: 1.43.4;
TRUE -> true, FALSE -> false
 1.42 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.41 01-Nov-2006  yamt branches: 1.41.4;
remove some __unused from function parameters.
 1.40 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.39 15-Sep-2006  yamt branches: 1.39.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.38 11-Dec-2005  christos branches: 1.38.8; 1.38.20;
merge ktrace-lwp.
 1.37 13-Sep-2005  yamt wrap swap related code by #ifdef VMSWAP. always #define VMSWAP for now.
 1.36 31-Jul-2005  yamt revert "defflag VMSWAP" changes for now.
there seems to be far more people who don't want to edit
their kernel config files than i thought.
 1.35 30-Jul-2005  yamt defflag VMSWAP.
 1.34 27-Jun-2005  thorpej branches: 1.34.2;
Use ANSI function decls.
 1.33 11-May-2005  yamt allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.32 01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.31 01-Sep-2004  yamt branches: 1.31.4; 1.31.6;
uvm_pagefree: when orphaning an A->K loaned page,
- decrement uvmexp.anonpages as it's no longer an anon page.
- null out anon->u.an_page as the anon no longer own the page.
uvm_anfree: add related assertions.
 1.30 01-Sep-2004  yamt uvm_anfree: remove a comment which is no longer true.
 1.29 05-May-2004  yamt fix a amap_wirerange deadlock problem by re-introducing
PG_RELEASED for anon pages. PR/23171 from Christian Limpach.
for details, see discussion filed in the PR database.

uvm_anon_release: a new function to free anon-owned PG_RELEASED page.
uvm_anfree: we can't wait for the page here because the caller might hold
amap lock. instead, just mark the page as PG_RELEASED.
who unbusy the page should check the PG_RELEASED.
uvm_aio_aiodone: uvm_anon_release() instead of uvm_page_unbusy()
if appropriate.
uvmfault_anonget: check PG_RELEASED.
 1.28 24-Mar-2004  junyoung branches: 1.28.2;
Nuke __P().
 1.27 06-Jan-2004  chs fix lock initialization in uvm_anon_add(). from PR 23831.
 1.26 28-Aug-2003  pk When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.
 1.25 11-Aug-2003  pk uao_pagein_page() & anon_pagein():
* return failure if the page cannot be retrieved.
* wakeup any waiters when releasing a page after successful page in.
 1.24 11-Aug-2003  pk Only deactivate pages if their wired count is zero.
 1.23 11-Aug-2003  pk Make sure to call uvm_swap_free() and uvm_swap_markbad() with valid (i.e.
positive) slot numbers.
 1.22 21-Sep-2002  chs branches: 1.22.6;
add missing anon lock around call to uvm_anon_lockloanpg().
 1.21 10-Nov-2001  lukem branches: 1.21.4;
add RCSIDs, and in some cases, slightly cleanup #include order
 1.20 06-Nov-2001  chs several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.
 1.19 21-Oct-2001  chs branches: 1.19.2;
add some missing spinlocks.
 1.18 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.17 25-May-2001  chs branches: 1.17.2; 1.17.4;
remove trailing whitespace.
 1.16 10-Mar-2001  chs eliminate the VM_PAGER_* error codes in favor of the traditional E* codes.
the mapping is:

VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART

for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
 1.15 18-Feb-2001  chs branches: 1.15.2;
clean up DIAGNOSTIC checks, use KASSERT().
 1.14 28-Jan-2001  thorpej Page scanner improvements, behavior is actually a bit more like
Mach VM's now. Specific changes:
- Pages now need not have all of their mappings removed before being
put on the inactive list. They only need to have the "referenced"
attribute cleared. This makes putting pages onto the inactive list
much more efficient. In order to eliminate redundant clearings of
"refrenced", callers of uvm_pagedeactivate() must now do this
themselves.
- When checking the "modified" attribute for a page (for clearing
PG_CLEAN), make sure to only do it if PG_CLEAN is currently set on
the page (saves a potentially expensive pmap operation).
- When scanning the inactive list, if a page is referenced, reactivate
it (this part was actually added in uvm_pdaemon.c,v 1.27). This
now works properly now that pages on the inactive list are allowed to
have mappings.
- When scanning the inactive list and considering a page for freeing,
remove all mappings, and then check the "modified" attribute if the
page is marked PG_CLEAN.
- When scanning the active list, if the page was referenced since its
last sweep by the scanner, don't deactivate it. (This part was
actually added in uvm_pdaemon.c,v 1.28.)

These changes greatly improve interactive performance during
moderate to high memory and I/O load.
 1.13 23-Jan-2001  thorpej Change uvm_analloc() to return a locked anon, update all callers,
and fix an anon locking protocol error in uvm_loanzero().
 1.12 23-Jan-2001  thorpej Sprinkle some assertions:
amap_free(): Assert that the amap is locked.
amap_share_protect(): Assert that the amap is locked.
amap_wipeout(): Assert that the amap is locked.
uvm_anfree(): Assert that the anon has a reference count of 0 and is
not locked.
uvm_anon_lockloanpg(): Assert that the anon is locked.
anon_pagein(): Assert that the anon is locked.
uvmfault_anonget(): Assert that the anon is locked.
uvm_pagealloc_strat(): Assert that the uobj or the anon is locked

And fix the problems these have uncovered:
amap_cow_now(): Lock the new anon after allocating it, and unref and
unlock it (rather than lock!) before freeing it in case
of an error condition. This should fix a problem reported
by Dan Carosone using cdrecord on an i386 MP kernel.
uvm_fault(): Case1B -- Lock the new anon afer allocating it, and unlock
it later when we unlock the old anon.
Case2 -- Lock the new anon after allocating it, and unlock
it later by passing it to uvmfault_unlockall() (we set anon
to NULL if we're not doing a promote fault).
 1.11 27-Dec-2000  chs when we fail to allocate anons to represent new swap space,
just return an error rather than panicing.
 1.10 25-Nov-2000  chs lots of cleanup:
use queue.h macros and KASSERT().
address amap offsets in pages instead of bytes.
make amap_ref() and amap_unref() take an amap, offset and length
instead of a vm_map_entry_t.
improve whitespace and comments.
 1.9 06-Aug-2000  thorpej Do something sane with a DIAGNOSTIC condition in an non-DIAGNOSTIC
kernel.
 1.8 05-Aug-2000  thorpej Correct a comment about locking wrt. uvmfault_anonget().
 1.7 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.6 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.5 11-Jan-2000  chs branches: 1.5.4;
add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.4 12-Sep-1999  chs branches: 1.4.2;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.
 1.3 14-Aug-1999  ross In uvm_anon_init() and uvm_anon_add(), initialize the ref count lock.
 1.2 26-Mar-1999  chs add uvmexp.swpgonly and use it to detect out-of-swap conditions.
 1.1 24-Jan-1999  chuck cleanup/reorg:
- break anon related functions out of uvm_amap.c and put them in their own
file (uvm_anon.c). includes break up uvm_anon_init into an amap and an
an anon init function
- ensure that only functions within the amap module access amap structure
fields (add macros to amap api as needed)
 1.4.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.4.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.4.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.4.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.4.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.5.4.3 14-Feb-2002  he Pull up revision 1.11 (requested by chs):
Make memory allocation failures during ``swapctl -a'' return an error
instead of causing a panic.
 1.5.4.2 06-Aug-2000  thorpej Pull up rev. 1.9:
Do something sane with a DIAGNOSTIC condition in an non-DIAGNOSTIC
kernel.
 1.5.4.1 06-Aug-2000  thorpej Pull up rev. 1.8:
Correct a comment about locking wrt. uvmfault_anonget().
 1.15.2.6 18-Oct-2002  nathanw Catch up to -current.
 1.15.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.15.2.4 22-Oct-2001  nathanw Catch up to -current.
 1.15.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.15.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.15.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.17.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.17.2.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.17.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.19.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.21.4.2 12-Mar-2002  thorpej Make afreelock an adaptive mutex, and rename it to afree_mutex.
 1.21.4.1 11-Mar-2002  thorpej Convert swap_syscall_lock and uvm.swap_data_lock to adaptive mutexes,
and rename them apporpriately.
 1.22.6.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.22.6.5 01-Apr-2005  skrll Sync with HEAD.
 1.22.6.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.22.6.3 18-Sep-2004  skrll Sync with HEAD.
 1.22.6.2 03-Sep-2004  skrll Sync with HEAD
 1.22.6.1 03-Aug-2004  skrll Sync with HEAD
 1.28.2.2 11-Sep-2004  he Pull up revisions 1.30-1.31 (requested by yamt in ticket #830):
Correct page accounting for anon pages: decrement
uvmexp.anonpages when orphaning an A->K loaned page, and
null out anon.u.an_page as the anon no longer owns the page
in that case. Add a few related assertions. Also correct
a comment.
 1.28.2.1 10-May-2004  tron Pull up revision 1.29 (requested by yamt in ticket #271):
fix a amap_wirerange deadlock problem by re-introducing
PG_RELEASED for anon pages. PR/23171 from Christian Limpach.
for details, see discussion filed in the PR database.
uvm_anon_release: a new function to free anon-owned PG_RELEASED page.
uvm_anfree: we can't wait for the page here because the caller might hold
amap lock. instead, just mark the page as PG_RELEASED.
who unbusy the page should check the PG_RELEASED.
uvm_aio_aiodone: uvm_anon_release() instead of uvm_page_unbusy()
if appropriate.
uvmfault_anonget: check PG_RELEASED.
 1.31.6.1 25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.31.4.1 29-Apr-2005  kent sync with -current
 1.34.2.7 21-Jan-2008  yamt sync with head
 1.34.2.6 15-Nov-2007  yamt sync with head.
 1.34.2.5 27-Oct-2007  yamt sync with head.
 1.34.2.4 03-Sep-2007  yamt sync with head.
 1.34.2.3 26-Feb-2007  yamt sync with head.
 1.34.2.2 30-Dec-2006  yamt sync with head.
 1.34.2.1 21-Jun-2006  yamt sync with head.
 1.38.20.1 18-Nov-2006  ad Sync with head.
 1.38.8.2 15-Sep-2006  yamt make UVM_KICK_PDAEMON() a real function and stop including
uvm_pdpolicy.h from uvm.h. this also fixes build of pmap(1).
 1.38.8.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.39.2.2 10-Dec-2006  yamt sync with head.
 1.39.2.1 22-Oct-2006  yamt sync with head
 1.41.4.2 24-Mar-2007  yamt sync with head.
 1.41.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.43.4.5 01-Nov-2007  ad Yielding to avoid livelock doesn't work well, so just sleep for 1 tick.
This too is inadequate and a better solution must be found. Discussed
with yamt@.
 1.43.4.4 01-Sep-2007  ad Update for pool_cache API changes.
 1.43.4.3 03-Jul-2007  yamt if wrong-order trylocking failed, avoid livelock by yielding cpu
before retrying. ok'ed by Andrew Doran.
 1.43.4.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.43.4.1 13-Mar-2007  ad Sync with head.
 1.44.8.1 15-Aug-2007  skrll Sync with HEAD.
 1.45.10.2 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.45.10.1 21-Jul-2007  ad file uvm_anon.c was added on branch matt-mips64 on 2007-07-21 19:21:54 +0000
 1.45.8.1 14-Oct-2007  yamt sync with head.
 1.45.6.4 23-Mar-2008  matt sync with HEAD
 1.45.6.3 09-Jan-2008  matt sync with HEAD
 1.45.6.2 08-Nov-2007  matt sync with -HEAD
 1.45.6.1 06-Nov-2007  matt sync with HEAD
 1.45.4.3 14-Nov-2007  joerg Sync with HEAD.
 1.45.4.2 11-Nov-2007  joerg Sync with HEAD.
 1.45.4.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.46.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.46.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.46.4.1 19-Nov-2007  mjf Sync with HEAD.
 1.46.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.48.6.2 19-Jan-2008  bouyer Sync with HEAD
 1.48.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.48.2.2 26-Dec-2007  ad Sync with head.
 1.48.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.51.40.1 08-Feb-2011  bouyer Sync with HEAD
 1.51.38.1 06-Jun-2011  jruoho Sync with HEAD.
 1.51.32.4 31-May-2011  rmind sync with head
 1.51.32.3 05-Mar-2011  rmind sync with head
 1.51.32.2 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.51.32.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.51.28.3 07-May-2012  matt Move call to uvm_anon_dropswap to with #ifdef VMSWAP
 1.51.28.2 16-Feb-2012  matt Track the victims selected by the pagedaemon and what happens to then.
Keep a hint for what page group has the most free pages for a given color.
 1.51.28.1 14-Feb-2012  matt Add more KASSERTs (more! more! more!).
When returning page to the free pool, make sure to dequeue the pages before
hand or free page queue corruption will happen.
 1.53.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.62.16.1 18-May-2014  rmind sync with head
 1.62.12.2 03-Dec-2017  jdolecek update from HEAD
 1.62.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.62.2.5 22-May-2014  yamt g/c a write-only variable
 1.62.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.62.2.3 23-Jan-2012  yamt fix swapoff locking
 1.62.2.2 26-Dec-2011  yamt - use O->A loan to serve read(2). based on a patch from Chuck Silvers
- associated O->A loan fixes.
 1.62.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.63.22.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.64.8.1 06-Jul-2021  martin Pull up following revision(s) - all via patch -
(requested by riastradh in ticket #1317):

sys/uvm/uvm_page.c: revision 1.248
sys/uvm/uvm_anon.c: revision 1.80
sys/rump/librump/rumpvfs/vm_vfs.c: revision 1.40
sys/rump/librump/rumpvfs/vm_vfs.c: revision 1.41
sys/rump/librump/rumpkern/vm.c: revision 1.191
sys/uvm/uvm_pager.c: revision 1.130
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.71
tests/rump/rumpkern/t_vm.c: revision 1.5
tests/rump/rumpkern/t_vm.c: revision 1.6
sys/rump/librump/rumpvfs/vm_vfs.c: revision 1.39

Move the handling of PG_PAGEOUT from uvm_aio_aiodone_pages() to
uvm_page_unbusy() so that all callers of uvm_page_unbusy() don't need to
handle this flag separately. Split out the pages part of uvm_aio_aiodone()
into uvm_aio_aiodone_pages() in rump just like in the real kernel.

In ZFS functions that can fail to copy data between the ARC and VM pages,
use uvm_aio_aiodone_pages() rather than uvm_page_unbusy() so that we can
handle these "I/O" errors. Fixes PR 55702.

fix an incorrect assertion in the previous commit.

Handle PG_PAGEOUT in uvm_anon_release() too.

Commit the ZFS file that I forgot in this previous commit:

Move the handling of PG_PAGEOUT from uvm_aio_aiodone_pages() to
uvm_page_unbusy() so that all callers of uvm_page_unbusy() don't need to
handle this flag separately. Split out the pages part of uvm_aio_aiodone()
into uvm_aio_aiodone_pages() in rump just like in the real kernel.

In ZFS functions that can fail to copy data between the ARC and VM pages,
use uvm_aio_aiodone_pages() rather than uvm_page_unbusy() so that we can
handle these "I/O" errors. Fixes PR 55702.
update the rump copy of uvm_page_unbusy() to match the real version,
in particular handle PG_PAGEOUT. fixes a few atf tests.
the busypage test is buggy, expect it to fail.

make rump's uvm_aio_aiodone_pages() look more like the kernel version.
fixes some more rumpy assertions.

for the busypage test, replace atf_tc_expect_fail() with atf_tc_skip()
because atf apparently has no way to expect a test program to crash.
fixes PR 55945.
 1.64.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.70.2.2 29-Feb-2020  ad Sync with head.
 1.70.2.1 17-Jan-2020  ad Sync with head.
 1.32 20-Mar-2020  ad Go back to freeing struct vm_anon one by one. There may have been an
advantage circa ~2008 but there isn't now.
 1.31 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.30 06-Aug-2011  rmind branches: 1.30.54; 1.30.60;
- Rework uvm_anfree() into uvm_anon_freelst(), which always drops the lock.
- Free anons in uvm_anon_freelst() without lock held.
- Mechanic sync to unused loaning code.
 1.29 24-Jun-2011  rmind Fix uvmplock regression - a lock against oneself case in amap_swap_off().
Happens since amap is NULL in uvmfault_anonget(), so uvmfault_unlockall()
keeps anon locked, when it should unlock it.
 1.28 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.27 02-Feb-2011  chuck branches: 1.27.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.26 14-Jun-2009  yamt branches: 1.26.4; 1.26.6; 1.26.8;
change the order of members of vm_anon for better packing.
 1.25 02-Jan-2008  ad branches: 1.25.10; 1.25.24;
Merge vmlocking2 to head.
 1.24 21-Feb-2007  thorpej branches: 1.24.4; 1.24.18; 1.24.24; 1.24.26; 1.24.30;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.23 11-Dec-2005  christos branches: 1.23.26;
merge ktrace-lwp.
 1.22 17-Sep-2005  yamt make VMSWAP optional again.
 1.21 13-Sep-2005  yamt wrap swap related code by #ifdef VMSWAP. always #define VMSWAP for now.
 1.20 31-Jul-2005  yamt revert "defflag VMSWAP" changes for now.
there seems to be far more people who don't want to edit
their kernel config files than i thought.
 1.19 30-Jul-2005  yamt defflag VMSWAP.
 1.18 11-May-2005  yamt branches: 1.18.2;
allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.17 05-May-2004  yamt fix a amap_wirerange deadlock problem by re-introducing
PG_RELEASED for anon pages. PR/23171 from Christian Limpach.
for details, see discussion filed in the PR database.

uvm_anon_release: a new function to free anon-owned PG_RELEASED page.
uvm_anfree: we can't wait for the page here because the caller might hold
amap lock. instead, just mark the page as PG_RELEASED.
who unbusy the page should check the PG_RELEASED.
uvm_aio_aiodone: uvm_anon_release() instead of uvm_page_unbusy()
if appropriate.
uvmfault_anonget: check PG_RELEASED.
 1.16 24-Mar-2004  junyoung branches: 1.16.2;
Nuke __P().
 1.15 26-May-2001  chs branches: 1.15.22;
replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.
 1.14 25-May-2001  chs remove trailing whitespace.
 1.13 27-Dec-2000  chs branches: 1.13.2;
when we fail to allocate anons to represent new swap space,
just return an error rather than panicing.
 1.12 11-Jan-2000  chs branches: 1.12.4;
add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.11 21-Jun-1999  thorpej branches: 1.11.2;
Protect prototypes, certain macros, and inlines from userland.
 1.10 26-Mar-1999  chs branches: 1.10.4;
add uvmexp.swpgonly and use it to detect out-of-swap conditions.
 1.9 24-Jan-1999  chuck cleanup/reorg:
- break anon related functions out of uvm_amap.c and put them in their own
file (uvm_anon.c). includes break up uvm_anon_init into an amap and an
an anon init function
- ensure that only functions within the amap module access amap structure
fields (add macros to amap api as needed)
 1.8 20-Nov-1998  chuck update outdated an_swslot comments
 1.7 09-Mar-1998  mrg KNF.
 1.6 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.5 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.4 09-Feb-1998  mrg KNF.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.10.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.11.2.2 05-Jan-2001  bouyer Sync with HEAD
 1.11.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.4.1 14-Feb-2002  he Pull up revision 1.13 (requested by chs):
Make memory allocation failures during ``swapctl -a'' return an error
instead of causing a panic.
 1.13.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.15.22.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.15.22.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.15.22.2 18-Sep-2004  skrll Sync with HEAD.
 1.15.22.1 03-Aug-2004  skrll Sync with HEAD
 1.16.2.1 10-May-2004  tron Pull up revision 1.17 (requested by yamt in ticket #271):
fix a amap_wirerange deadlock problem by re-introducing
PG_RELEASED for anon pages. PR/23171 from Christian Limpach.
for details, see discussion filed in the PR database.
uvm_anon_release: a new function to free anon-owned PG_RELEASED page.
uvm_anfree: we can't wait for the page here because the caller might hold
amap lock. instead, just mark the page as PG_RELEASED.
who unbusy the page should check the PG_RELEASED.
uvm_aio_aiodone: uvm_anon_release() instead of uvm_page_unbusy()
if appropriate.
uvmfault_anonget: check PG_RELEASED.
 1.18.2.3 21-Jan-2008  yamt sync with head
 1.18.2.2 26-Feb-2007  yamt sync with head.
 1.18.2.1 21-Jun-2006  yamt sync with head.
 1.23.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.24.30.1 02-Jan-2008  bouyer Sync with HEAD
 1.24.26.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.24.24.1 18-Feb-2008  mjf Sync with HEAD.
 1.24.18.1 09-Jan-2008  matt sync with HEAD
 1.24.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.25.24.1 23-Jul-2009  jym Sync with HEAD.
 1.25.10.1 20-Jun-2009  yamt sync with head
 1.26.8.1 08-Feb-2011  bouyer Sync with HEAD
 1.26.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.26.4.2 05-Mar-2011  rmind sync with head
 1.26.4.1 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.27.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.30.60.1 29-Feb-2020  ad Sync with head.
 1.30.54.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.157 24-Feb-2023  riastradh uvm: Eliminate __HAVE_ATOMIC_AS_MEMBAR conditionals.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html
 1.156 31-May-2022  andvar fix various typos in comments, documentation and messages.
 1.155 09-Apr-2022  riastradh sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.
 1.154 12-Mar-2022  riastradh sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.
 1.153 13-Mar-2021  skrll Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats
 1.152 04-Nov-2020  chs In uvmpd_tryownerlock(), if the initial try-lock of the owner lock fails
then rather than do more try-locks and eventually sleep for a tick,
take a hold on the current owner's lock, drop the page interlock,
and acquire the lock that we took the hold on in a blocking fashion.
After we get the lock, check if the lock that we acquired is still
the lock for the owner of the page that we're interested in.
If the owner hasn't changed then can proceed with this page,
otherwise we will skip this page and move on to a different page.
This dramatically reduces the amount of time that the pagedaemon
sleeps trying to get locks, since even 1 tick is an eternity to sleep
in this context and it was easy to trigger that case in practice,
and with this new method the pagedaemon only very rarely actually blocks
to acquire the lock that it wants since the object locks are adaptive,
and when the pagedaemon does block then the amount of time it spends
sleeping will be generally be much less than 1 tick.
 1.151 19-Aug-2020  chs branches: 1.151.2;
in uao_get(), if we unlock the uobj to read a page from swap,
we must clear the cached page array because it is now stale.
also add a missing call to uvm_page_array_fini() if the I/O fails.
fixes PR 55493.
 1.150 19-Aug-2020  simonb Remove trailing \n from UVMHIST_LOG() format strings.
 1.149 09-Jul-2020  skrll Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
 1.148 08-Jul-2020  skrll Trailing whitespace
 1.147 25-May-2020  ad uao_get(): in the PGO_SYNCIO case use uvm_page_array and simplify control
flow a little bit.
 1.146 25-May-2020  ad - Alter the convention for uvm_page_array slightly, so the basic search
parameters can't change part way through a search: move the "uobj" and
"flags" arguments over to uvm_page_array_init() and store those with the
array.

- With that, detect when it's not possible to find any more pages in the
tree with the given search parameters, and avoid repeated tree lookups if
the caller loops over uvm_page_array_fill_and_peek().
 1.145 25-May-2020  ad PR kern/55300: ubciomove triggers page not dirty assertion

If overwriting an existing page, mark it dirty since there may be no
managed mapping to track the modification.
 1.144 22-May-2020  ad uao_get(): handle PGO_OVERWRITE.
 1.143 20-May-2020  hannken Suppress GCC warnings and fix a UVMHIST_LOG() statement.

Kernels ALL/amd64 and ALL/i386 and port sparc64 build again.
 1.142 19-May-2020  ad PR kern/32166: pgo_get protocol is ambiguous
Also problems with tmpfs+nfs noted by hannken@.

Don't pass PGO_ALLPAGES to pgo_get, and ignore PGO_DONTCARE in the
!PGO_LOCKED case. In uao_get() have uvm_pagealloc() take care of page
zeroing and release busy pages on error.
 1.141 17-May-2020  ad Start trying to reduce cache misses on vm_page during fault processing.

- Make PGO_LOCKED getpages imply PGO_NOBUSY and remove the latter. Mark
pages busy only when there's actually I/O to do.

- When doing COW on a uvm_object, don't mess with neighbouring pages. In
all likelyhood they're already entered.

- Don't mess with neighbouring VAs that have existing mappings as replacing
those mappings with same can be quite costly.

- Don't enqueue pages for neighbour faults unless not enqueued already, and
don't activate centre pages unless uvmpdpol says its useful.

Also:

- Make PGO_LOCKED getpages on UAOs work more like vnodes: do gang lookup in
the radix tree, and don't allocate new pages.

- Fix many assertion failures around faults/loans with tmpfs.
 1.140 15-May-2020  ad PR kern/55268: tmpfs is slow

uao_get(): in the PGO_LOCKED case, we're okay to allocate a new page as long
as the caller holds a write lock. PGO_NOBUSY doesn't put a stop to that.
 1.139 22-Mar-2020  ad Process concurrent page faults on individual uvm_objects / vm_amaps in
parallel, where the relevant pages are already in-core. Proposed on
tech-kern.

Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until
adjustments are made to their pmaps.
 1.138 17-Mar-2020  ad Tweak the March 14th change to make page waits interlocked by pg->interlock.
Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude
possible bugs.
 1.137 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.136 24-Feb-2020  rin 0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.
 1.135 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.134 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.133 31-Dec-2019  ad branches: 1.133.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.132 15-Dec-2019  ad Merge from yamt-pagecache:

- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
 1.131 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.130 01-Dec-2019  ad Avoid calling pmap_page_protect() while under uvm_pageqlock.
 1.129 01-Dec-2019  ad - Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.
 1.128 28-Jul-2019  msaitoh Avoid undefined behavior in uao_pagein_page(). Found by kUBSan. OK'd by
riastradh. I think this is a real bug on amd64 at least.
 1.127 28-May-2018  chs branches: 1.127.2;
allow tmpfs files to be larger than 4GB.
 1.126 28-Oct-2017  pgoyette branches: 1.126.2;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.125 30-May-2017  chs branches: 1.125.2;
add assertions that would have caught the recent audio mmap bugs.
 1.124 28-Jul-2016  martin PR kern/51371: fix misleading indentation
 1.123 24-Aug-2015  pooka branches: 1.123.2;
to garnish, dust with _KERNEL_OPT
 1.122 25-May-2014  riastradh branches: 1.122.4;
Allow VM_NFREELIST in uao_set_pgfl, meaning any freelist is OK.
 1.121 22-May-2014  riastradh Add uao_set_pgfl to limit a uvm_aobj's pages to a specified freelist.

Brought up on tech-kern:

https://mail-index.netbsd.org/tech-kern/2014/05/20/msg017095.html
 1.120 25-Oct-2013  martin branches: 1.120.2;
Mark a diagnostic-only variable
 1.119 15-Sep-2012  matt branches: 1.119.2;
#include <sys/atomic.h>
 1.118 14-Sep-2012  rmind - Manage anonymous UVM object reference count with atomic ops.
- Fix an old bug of possible lock against oneself (uao_detach_locked() is
called from uao_swap_off() with uao_list_lock acquired). Also removes
the try-lock dance in uao_swap_off(), since the lock order changes.
 1.117 14-Sep-2012  rmind - Describe uvm_aobj and the lock order.
- Remove unnecessary uao_dropswap_range1() wrapper.
- KNF. Sprinkle some __cacheline_aligned.
 1.116 06-Sep-2011  matt branches: 1.116.2; 1.116.8; 1.116.12;
Allocate color appropriate pages.
 1.115 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.114 23-Apr-2011  rmind branches: 1.114.2;
Replace "malloc" in comments, remove unnecessary header inclusions.
 1.113 11-Feb-2011  rmind Replace uvm_aobj_cache with kmem(9).
 1.112 02-Feb-2011  chuck udpate license clauses on chuck^2 code to match the new-style BSD licenses.
based on diff that rmind@ sent me (and confirmed with chs@ via email).

no functional change with this commit.
 1.111 25-Jan-2011  enami Remove nop code; the code is moved to uao_dropswap_range1() when it is
introduced in rev. 1.75.
 1.110 29-Jul-2010  hannken branches: 1.110.2; 1.110.4;
Add vm page flag PG_MARKER and use it to tag dummy marker pages
in genfs_do_putpages() and uao_put().
Use 'v_uobj.uo_npages' to check for an empty memq.
Put some assertions where these marker pages may not appear.

Ok: YAMAMOTO Takashi <yamt@netbsd.org>
 1.109 28-May-2010  rmind uvm_fault_{upper,lower}_done: move drop-swap outside the page-queues lock.
Assert for object lock being held (or ref count 0) in uao_set_swslot().
 1.108 21-Oct-2009  rmind branches: 1.108.2; 1.108.4;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.107 13-Sep-2009  pooka Wipe out the last vestiges of POOL_INIT with one swift stroke. In
most cases, use a proper constructor. For proplib, give a local
equivalent of POOL_INIT for the kernel object implementation. This
way the code structure can be preserved, and a local link set is
not hazardous anyway (unless proplib is split to several modules,
but that'll be the day).

tested by booting a kernel in qemu and compile-testing i386/ALL
 1.106 18-Feb-2009  yamt make some functions static.
 1.105 16-Jan-2009  yamt branches: 1.105.2;
- g/c stale function prototypes.
- rename UVM_PAGE_HASH_PENALTY to UVM_PAGE_TREE_PENALTY.
 1.104 18-Oct-2008  rmind branches: 1.104.2; 1.104.10;
- Initialize pool subsystem and kmem(9) earlier, when UVM is up enough.
- Remove uao_hashinit() workaround used for anon-objects.
- Replace malloc with kmem.

OK by <yamt>.
 1.103 25-Jun-2008  ad branches: 1.103.2;
Use pool_cache.
 1.102 04-Jun-2008  ad branches: 1.102.2;
vm_page: put TAILQ_ENTRY into a union with LIST_ENTRY, so we can use both.
 1.101 03-Jun-2008  ad uao_reference, uao_detach: we don't do reference counting on kernel objects,
so don't lock them needlessly.
 1.100 05-May-2008  ad branches: 1.100.2;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.99 27-Feb-2008  ad branches: 1.99.2; 1.99.4;
Minor corrections to comments.
 1.98 27-Feb-2008  yamt uao_put: fix a race with pageout.
 1.97 18-Jan-2008  yamt branches: 1.97.2; 1.97.6;
push pmap_clear_reference calls into pdpolicy code, where reference bits
actually matter.
 1.96 02-Jan-2008  ad Merge vmlocking2 to head.
 1.95 01-Dec-2007  yamt branches: 1.95.2; 1.95.6;
constify pagerops.
 1.94 01-Dec-2007  yamt use designated initiaizers for uvm_pagerops.
 1.93 05-Aug-2007  pooka branches: 1.93.2; 1.93.8; 1.93.10;
In uao_get(), drop object lock only after dropswap to avoid KASSERT panic.

Should fix tmpfs problem reported by riz on current-users. yamt ok.
 1.92 24-Jul-2007  ad branches: 1.92.4;
In order to pacify assertions, make uao_list_lock + uvm_swap_data_lock
spinlocks for the time being.
 1.91 21-Jul-2007  ad Temporarily work around an assertion from mutex_enter.
 1.90 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.89 09-Jul-2007  ad branches: 1.89.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.88 12-Mar-2007  ad branches: 1.88.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.87 22-Feb-2007  thorpej branches: 1.87.4;
TRUE -> true, FALSE -> false
 1.86 22-Feb-2007  matt Fix lossage from boolean_t -> bool and updated x86 bus_dma.
 1.85 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.84 24-Jan-2007  hubertf branches: 1.84.2;
Remove duplicate #includes, patch contributed in private mail
by Slava Semushin <slava.semushin@gmail.com>.

To verify that no nasty side effects of duplicate includes (or their
removal) have an effect here, I've compiled an i386/ALL kernel with
and without the patch, and the only difference in the resulting .o
files was in shifted line numbers in some assert() calls.
The comparison of the .o files was based on the output of "objdump -D".

Thanks to martin@ for the input on testing.
 1.83 15-Dec-2006  yamt put ->K loaned pages on the page queue, so that page loaning doesn't
disturb pagedaemon/pdpolicy.
 1.82 01-Nov-2006  yamt branches: 1.82.2; 1.82.4;
remove some __unused from function parameters.
 1.81 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.80 15-Sep-2006  yamt branches: 1.80.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.79 01-Sep-2006  cherry branches: 1.79.2;
bumps kernel aobj to 64 bit. \
See: http://mail-index.netbsd.org/tech-kern/2006/03/07/0007.html
 1.78 24-Dec-2005  yamt branches: 1.78.4; 1.78.8;
uao_get: don't mark pages dirty unless it's a write fault.
 1.77 05-Dec-2005  yamt uao_pagein_page: pass PGO_SYNCIO to uao_get.
uao_get doesn't always assume PGO_SYNCIO after yamt-readahead merge.

reported and a dump provided by Masanori Kanaoka.
 1.76 29-Nov-2005  yamt merge yamt-readahead branch.
 1.75 08-Nov-2005  yamt branches: 1.75.2;
add a function to drop all swap slots in a given range. for tmpfs.
XXX maybe it's better to implement true truncation.
 1.74 17-Sep-2005  yamt make VMSWAP optional again.
 1.73 14-Sep-2005  yamt uao_put: don't skip loaned or wired pages.
 1.72 13-Sep-2005  yamt wrap swap related code by #ifdef VMSWAP. always #define VMSWAP for now.
 1.71 13-Sep-2005  yamt uao_put: recognize endoff == 0 as "to the end of the object",
as VOP_PUTPAGES (thus vnode pager) does. for tmpfs.
 1.70 31-Jul-2005  yamt revert "defflag VMSWAP" changes for now.
there seems to be far more people who don't want to edit
their kernel config files than i thought.
 1.69 30-Jul-2005  yamt defflag VMSWAP.
 1.68 27-Jun-2005  thorpej branches: 1.68.2;
Sprinkle some static.
 1.67 27-Jun-2005  thorpej Use ANSI function decls.
 1.66 06-Jun-2005  yamt introduce a macro to initialize uvm_object and use it.
 1.65 29-May-2005  christos avoid shadow variables.
remove unneeded casts.
 1.64 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.63 05-Apr-2004  simonb Fix a tyop.
 1.62 24-Mar-2004  junyoung Nuke __P().
 1.61 18-Sep-2003  drochner Fix a reversed logic in swap deallocation which could lead to
uvm_swap_free() being called with a zero slot; this might have been
the reason for crashes with sysvshm and heavy swapping.
(PR kern/22752 by Tom Spindler)
Confirmed by Chuck Silvers.
 1.60 28-Aug-2003  pk When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.
 1.59 11-Aug-2003  pk uao_pagein_page() & anon_pagein():
* return failure if the page cannot be retrieved.
* wakeup any waiters when releasing a page after successful page in.
 1.58 11-Aug-2003  pk Only deactivate pages if their wired count is zero.
 1.57 11-Aug-2003  pk Make sure to call uvm_swap_free() and uvm_swap_markbad() with valid (i.e.
positive) slot numbers.
 1.56 12-Apr-2003  yamt branches: 1.56.2;
unbusy a page after put it on the queue.
fix a panic with UVM_PAGE_TRKOWN when doing swapoff.
 1.55 09-Feb-2003  pk uao_put: release uvm object's lock only after we're done with its page list.
 1.54 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.53 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.52 24-Nov-2002  scw Quell uninitialised variable warnings.
 1.51 09-May-2002  enami In uao_put(), if we wait for the busy page owned by someone else,
we can't simply reuse the pointor to the page. Instead, we need to
acquire it again. So, rearrange the loop like genfs_putpages() does.
Reviewed by chuq.
 1.50 08-Mar-2002  thorpej branches: 1.50.2;
Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.49 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.48 07-Nov-2001  chs only acquire the lock for swpgonly if we actually need to adjust it.
 1.47 06-Nov-2001  chs several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.
 1.46 15-Sep-2001  chs branches: 1.46.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.45 23-Jun-2001  chs branches: 1.45.2; 1.45.4;
don't for memory in uao_set_swlot() since we're holding spinlocks,
instead return -1. adjust callers to handle this new error return.
fixes PR 13194.
 1.44 22-Jun-2001  chs don't use the list pointers after we take an object off its list.
 1.43 26-May-2001  chs replace vm_page_t with struct vm_page *.
 1.42 26-May-2001  chs replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.
 1.41 25-May-2001  chs remove trailing whitespace.
 1.40 10-Mar-2001  chs eliminate the VM_PAGER_* error codes in favor of the traditional E* codes.
the mapping is:

VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART

for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
 1.39 18-Feb-2001  chs branches: 1.39.2;
clean up DIAGNOSTIC checks, use KASSERT().
 1.38 28-Jan-2001  thorpej Page scanner improvements, behavior is actually a bit more like
Mach VM's now. Specific changes:
- Pages now need not have all of their mappings removed before being
put on the inactive list. They only need to have the "referenced"
attribute cleared. This makes putting pages onto the inactive list
much more efficient. In order to eliminate redundant clearings of
"refrenced", callers of uvm_pagedeactivate() must now do this
themselves.
- When checking the "modified" attribute for a page (for clearing
PG_CLEAN), make sure to only do it if PG_CLEAN is currently set on
the page (saves a potentially expensive pmap operation).
- When scanning the inactive list, if a page is referenced, reactivate
it (this part was actually added in uvm_pdaemon.c,v 1.27). This
now works properly now that pages on the inactive list are allowed to
have mappings.
- When scanning the inactive list and considering a page for freeing,
remove all mappings, and then check the "modified" attribute if the
page is marked PG_CLEAN.
- When scanning the active list, if the page was referenced since its
last sweep by the scanner, don't deactivate it. (This part was
actually added in uvm_pdaemon.c,v 1.28.)

These changes greatly improve interactive performance during
moderate to high memory and I/O load.
 1.37 25-Nov-2000  chs lots of cleanup:
use queue.h macros and KASSERT().
address amap offsets in pages instead of bytes.
make amap_ref() and amap_unref() take an amap, offset and length
instead of a vm_map_entry_t.
improve whitespace and comments.
 1.36 24-Nov-2000  chs g/c unused pager ops "asyncget" and "aiodone".
 1.35 08-Nov-2000  ad Update for hashinit() change.
 1.34 02-Aug-2000  thorpej MALLOC()/FREE() are not to be used for variable-sized allocations.
 1.33 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.32 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.31 19-May-2000  thorpej NULL != 0
 1.30 10-Apr-2000  thorpej Use UVM_PGA_ZERO in a few (easy) places.
 1.29 03-Apr-2000  chs remove the "shareprot" pagerop. it's not needed anymore since
share maps are long gone.
 1.28 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.27 11-Jan-2000  chs add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.26 12-Sep-1999  chs branches: 1.26.2;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.
 1.25 21-Aug-1999  thorpej When handling the MADV_FREE case, if the amap or aobj has more than
one reference, go through the deactivate path; the page may actually
be in use by another process.

Fixes kern/8239.
 1.24 22-Jul-1999  thorpej Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.23 22-Jul-1999  thorpej 0 -> FALSE in a few places.
 1.22 17-Jul-1999  thorpej Implement uao_flush(). This is pretty much identical to the "amap flush"
code in uvm_map_clean().
 1.21 07-Jul-1999  thorpej Update a comment in uao_flush().
 1.20 25-May-1999  thorpej Macro'ize the test for "object is a kernel object".
 1.19 11-Apr-1999  chs add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.18 26-Mar-1999  chs branches: 1.18.2;
add uvmexp.swpgonly and use it to detect out-of-swap conditions.
 1.17 25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.16 24-Mar-1999  cgd after discussion with chuck, nuke pgo_attach from uvm_pagerops
 1.15 18-Oct-1998  chs branches: 1.15.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.
 1.14 18-Sep-1998  thorpej Add a comment documenting the last change.
 1.13 18-Sep-1998  thorpej Don't use the nointr pool page allocator for the uao_swhash_elt pool. We
need to ensure that these come from a non-pageable kernel map, otherwise
we can run into a deadlock condition (as noticed by Chuck Silvers).
 1.12 31-Aug-1998  thorpej Use the pool allocator w/ the "nointr" pool page allocator for uvm_aobj
and uao_swhash_elt structures. Also, fix a bug in uao_set_swlot() where
if setting the swslot to 0 (freeing swap resources), and no swslot was
currently allocated, a new entry would be allocated anyhow (revealed during
pool'ification).
 1.11 13-Aug-1998  drochner minor consistency nit: the page index into an anon object is always
assigned to from integer types, and it is compared to integers. So
let it be an integer instead of vsize_t.
 1.10 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.9 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.8 01-Mar-1998  fvdl branches: 1.8.2;
Merge with Lite2 + local changes
 1.7 12-Feb-1998  chs add copyright.
 1.6 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.5 09-Feb-1998  mrg KNF.
 1.4 07-Feb-1998  mrg restore rcsids
 1.3 07-Feb-1998  chs enable hashtables for swapslot storage - deadlock is fixed.
fix initialization of swhash entries.
use malloc(M_NOWAIT) for creating kernel object.
avoid dereferencing a vm_page once the page has been freed.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.8.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.15.2.2 25-Feb-1999  chs in uao_pagein_page() move pmap ops before clearing of page busy bit.
thread_wakeup() -> wakeup().
 1.15.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.18.2.1 16-Apr-1999  chs branches: 1.18.2.1.2;
pull up 1.18 -> 1.19:
add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.18.2.1.2.5 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.18.2.1.2.4 02-Aug-1999  thorpej Update from trunk.
 1.18.2.1.2.3 21-Jun-1999  thorpej Fix a merge error.
 1.18.2.1.2.2 21-Jun-1999  thorpej Sync w/ -current.
 1.18.2.1.2.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.26.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.26.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.26.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.26.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.26.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.39.2.9 11-Dec-2002  thorpej Sync with HEAD.
 1.39.2.8 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.39.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.39.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.39.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.39.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.39.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.39.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.39.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.45.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.45.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.45.2.2 16-Mar-2002  jdolecek Catch up with -current.
 1.45.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.46.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.50.2.1 11-Mar-2002  thorpej Convert swap_syscall_lock and uvm.swap_data_lock to adaptive mutexes,
and rename them apporpriately.
 1.56.2.5 11-Dec-2005  christos Sync with head.
 1.56.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.56.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.56.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.56.2.1 03-Aug-2004  skrll Sync with HEAD
 1.68.2.7 17-Mar-2008  yamt sync with head.
 1.68.2.6 21-Jan-2008  yamt sync with head
 1.68.2.5 07-Dec-2007  yamt sync with head
 1.68.2.4 03-Sep-2007  yamt sync with head.
 1.68.2.3 26-Feb-2007  yamt sync with head.
 1.68.2.2 30-Dec-2006  yamt sync with head.
 1.68.2.1 21-Jun-2006  yamt sync with head.
 1.75.2.1 26-Nov-2005  yamt add minimum support of async get. ie. ignore them.
 1.78.8.2 03-Sep-2006  yamt sync with head.
 1.78.8.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.78.4.1 09-Sep-2006  rpaulo sync with head
 1.79.2.3 01-Feb-2007  ad Sync with head.
 1.79.2.2 12-Jan-2007  ad Sync with head.
 1.79.2.1 18-Nov-2006  ad Sync with head.
 1.80.2.3 18-Dec-2006  yamt sync with head.
 1.80.2.2 10-Dec-2006  yamt sync with head.
 1.80.2.1 22-Oct-2006  yamt sync with head
 1.82.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.82.2.1 24-Aug-2007  liamjfoy Pull up following revision(s) (requested by pooka in ticket #825):
sys/uvm/uvm_aobj.c: revision 1.93
In uao_get(), drop object lock only after dropswap to avoid KASSERT panic.
Should fix tmpfs problem reported by riz on current-users. yamt ok.
 1.84.2.2 24-Mar-2007  yamt sync with head.
 1.84.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.87.4.7 01-Nov-2007  ad Yielding to avoid livelock doesn't work well, so just sleep for 1 tick.
This too is inadequate and a better solution must be found. Discussed
with yamt@.
 1.87.4.6 21-Aug-2007  yamt destroy vmobjlock.
 1.87.4.5 20-Aug-2007  ad Sync with HEAD.
 1.87.4.4 03-Jul-2007  yamt if wrong-order trylocking failed, avoid livelock by yielding cpu
before retrying. ok'ed by Andrew Doran.
 1.87.4.3 05-Apr-2007  ad - Put a per-LWP lock around swapin / swapout.
- Replace use of lockmgr().
- Minor locking fixes and assertions.
- uvm_map.h no longer pulls in proc.h, etc.
- Use kpause where appropriate.
 1.87.4.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.87.4.1 13-Mar-2007  ad Sync with head.
 1.88.2.1 11-Jul-2007  mjf Sync with head.
 1.89.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.92.4.2 03-Dec-2007  joerg Sync with HEAD.
 1.92.4.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.93.10.2 05-Aug-2007  pooka In uao_get(), drop object lock only after dropswap to avoid KASSERT panic.

Should fix tmpfs problem reported by riz on current-users. yamt ok.
 1.93.10.1 05-Aug-2007  pooka file uvm_aobj.c was added on branch matt-mips64 on 2007-08-05 10:19:24 +0000
 1.93.8.2 18-Feb-2008  mjf Sync with HEAD.
 1.93.8.1 08-Dec-2007  mjf Sync with HEAD.
 1.93.2.2 23-Mar-2008  matt sync with HEAD
 1.93.2.1 09-Jan-2008  matt sync with HEAD
 1.95.6.2 19-Jan-2008  bouyer Sync with HEAD
 1.95.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.95.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.97.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.97.6.4 29-Jun-2008  mjf Sync with HEAD.
 1.97.6.3 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.97.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.97.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.97.2.1 24-Mar-2008  keiichi sync with head.
 1.99.4.5 11-Aug-2010  yamt sync with head.
 1.99.4.4 11-Mar-2010  yamt sync with head
 1.99.4.3 16-Sep-2009  yamt sync with head
 1.99.4.2 04-May-2009  yamt sync with head.
 1.99.4.1 16-May-2008  yamt sync with head.
 1.99.2.3 17-Jun-2008  yamt sync with head.
 1.99.2.2 04-Jun-2008  yamt sync with head
 1.99.2.1 18-May-2008  yamt sync with head.
 1.100.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.100.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.102.2.1 27-Jun-2008  simonb Sync with head.
 1.103.2.1 19-Oct-2008  haad Sync with HEAD.
 1.104.10.3 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.104.10.2 03-Jun-2011  matt Restore $NetBSD$
 1.104.10.1 25-May-2011  matt Make uvm_map recognize UVM_FLAG_COLORMATCH which tells uvm_map that the
'align' argument specifies the starting color of the KVA range to be returned.

When calling uvm_km_alloc with UVM_KMF_VAONLY, also specify the starting
color of the kva range returned (UMV_KMF_COLORMATCH) and pass those to
uvm_map.

In uvm_pglistalloc, make sure the pages being returned have sequentially
advancing colors (so they can be mapped in a contiguous address range).
Add a few missing UVM_FLAG_COLORMATCH flags to uvm_pagealloc calls.

Make the socket and pipe loan color-safe.

Make the mips pmap enforce strict page color (color(VA) == color(PA)).
 1.104.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.104.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.105.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.108.4.5 31-May-2011  rmind sync with head
 1.108.4.4 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.108.4.3 05-Mar-2011  rmind sync with head
 1.108.4.2 30-May-2010  rmind sync with head
 1.108.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.108.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.110.4.2 17-Feb-2011  bouyer Sync with HEAD
 1.110.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.110.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.114.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.116.12.3 03-Dec-2017  jdolecek update from HEAD
 1.116.12.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.116.12.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.116.8.1 22-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #694):
sys/uvm/uvm_aobj.h: revision 1.22
sys/uvm/uvm_aobj.c: revision 1.117
sys/uvm/uvm_aobj.c: revision 1.118
sys/uvm/uvm_aobj.c: revision 1.119
sys/uvm/uvm_object.h: revision 1.33
- Describe uvm_aobj and the lock order.
- Remove unnecessary uao_dropswap_range1() wrapper.
- KNF. Sprinkle some __cacheline_aligned.
- Manage anonymous UVM object reference count with atomic ops.
- Fix an old bug of possible lock against oneself (uao_detach_locked() is
called from uao_swap_off() with uao_list_lock acquired). Also removes
the try-lock dance in uao_swap_off(), since the lock order changes.
 1.116.2.9 22-May-2014  yamt fix a merge botch
 1.116.2.8 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.116.2.7 30-Oct-2012  yamt sync with head
 1.116.2.6 26-Nov-2011  yamt - uvm_page_array_fill: add some more parameters
- uvn_findpages: use gang-lookup
- genfs_putpages: re-enable backward clustering
- mechanical changes after the recent radixtree.h api changes
 1.116.2.5 18-Nov-2011  yamt - use mutex obj for pageable object
- add a function to wait for a mutex obj being available
- replace some "livelock" kpauses with it
 1.116.2.4 13-Nov-2011  yamt cache UVM_OBJ_IS_VNODE in pqflags
 1.116.2.3 06-Nov-2011  yamt remove pg->listq and uobj->memq
 1.116.2.2 06-Nov-2011  yamt adapt aobj
 1.116.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.119.2.1 18-May-2014  rmind sync with head
 1.120.2.1 10-Aug-2014  tls Rebase.
 1.122.4.3 28-Aug-2017  skrll Sync with HEAD
 1.122.4.2 05-Oct-2016  skrll Sync with HEAD
 1.122.4.1 22-Sep-2015  skrll Sync with HEAD
 1.123.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.125.2.2 21-Aug-2019  martin Pull up following revision(s) (requested by msaitoh in ticket #1342):

sys/uvm/uvm_aobj.c: revision 1.128

Avoid undefined behavior in uao_pagein_page(). Found by kUBSan. OK'd by
riastradh. I think this is a real bug on amd64 at least.
 1.125.2.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.126.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.127.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.133.2.2 29-Feb-2020  ad Sync with head.
 1.133.2.1 17-Jan-2020  ad Sync with head.
 1.151.2.2 03-Apr-2021  thorpej Sync with HEAD.
 1.151.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.23 18-Oct-2013  christos use __USE() for empty macro
 1.22 14-Sep-2012  rmind branches: 1.22.2;
- Describe uvm_aobj and the lock order.
- Remove unnecessary uao_dropswap_range1() wrapper.
- KNF. Sprinkle some __cacheline_aligned.
 1.21 02-Feb-2011  chuck branches: 1.21.4; 1.21.10; 1.21.14;
udpate license clauses on chuck^2 code to match the new-style BSD licenses.
based on diff that rmind@ sent me (and confirmed with chs@ via email).

no functional change with this commit.
 1.20 01-Dec-2007  yamt branches: 1.20.40; 1.20.46; 1.20.48;
remove a duplicated decl. of aobj_pager.
 1.19 22-Feb-2007  matt branches: 1.19.16; 1.19.18; 1.19.24;
Fix lossage from boolean_t -> bool and updated x86 bus_dma.
 1.18 11-Dec-2005  christos branches: 1.18.26;
merge ktrace-lwp.
 1.17 08-Nov-2005  yamt add a function to drop all swap slots in a given range. for tmpfs.
XXX maybe it's better to implement true truncation.
 1.16 17-Sep-2005  yamt make VMSWAP optional again.
 1.15 13-Sep-2005  yamt wrap swap related code by #ifdef VMSWAP. always #define VMSWAP for now.
 1.14 31-Jul-2005  yamt revert "defflag VMSWAP" changes for now.
there seems to be far more people who don't want to edit
their kernel config files than i thought.
 1.13 30-Jul-2005  yamt defflag VMSWAP.
 1.12 24-Mar-2004  junyoung branches: 1.12.16;
Nuke __P().
 1.11 15-Sep-2001  chs branches: 1.11.18;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.10 11-Jan-2000  chs branches: 1.10.6; 1.10.8; 1.10.10;
add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.9 21-Jun-1999  thorpej branches: 1.9.2;
Protect prototypes, certain macros, and inlines from userland.
 1.8 26-Mar-1999  chs branches: 1.8.4;
add uvmexp.swpgonly and use it to detect out-of-swap conditions.
 1.7 25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.6 12-Feb-1998  chs branches: 1.6.4;
add copyright.
 1.5 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.4 07-Feb-1998  mrg restore rcsids
 1.3 07-Feb-1998  chs declare aobj_pager, needed in uvm_km.c.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.6.4.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.8.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.8.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.9.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.10.1 01-Oct-2001  fvdl Catch up with -current.
 1.10.8.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.6.1 21-Sep-2001  nathanw Catch up to -current.
 1.11.18.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.11.18.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.11.18.2 18-Sep-2004  skrll Sync with HEAD.
 1.11.18.1 03-Aug-2004  skrll Sync with HEAD
 1.12.16.3 07-Dec-2007  yamt sync with head
 1.12.16.2 26-Feb-2007  yamt sync with head.
 1.12.16.1 21-Jun-2006  yamt sync with head.
 1.18.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.19.24.1 08-Dec-2007  mjf Sync with HEAD.
 1.19.18.1 09-Jan-2008  matt sync with HEAD
 1.19.16.1 03-Dec-2007  joerg Sync with HEAD.
 1.20.48.1 08-Feb-2011  bouyer Sync with HEAD
 1.20.46.1 06-Jun-2011  jruoho Sync with HEAD.
 1.20.40.1 05-Mar-2011  rmind sync with head
 1.21.14.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.21.14.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.21.10.1 22-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #694):
sys/uvm/uvm_aobj.h: revision 1.22
sys/uvm/uvm_aobj.c: revision 1.117
sys/uvm/uvm_aobj.c: revision 1.118
sys/uvm/uvm_aobj.c: revision 1.119
sys/uvm/uvm_object.h: revision 1.33
- Describe uvm_aobj and the lock order.
- Remove unnecessary uao_dropswap_range1() wrapper.
- KNF. Sprinkle some __cacheline_aligned.
- Manage anonymous UVM object reference count with atomic ops.
- Fix an old bug of possible lock against oneself (uao_detach_locked() is
called from uao_swap_off() with uao_list_lock acquired). Also removes
the try-lock dance in uao_swap_off(), since the lock order changes.
 1.21.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.21.4.1 30-Oct-2012  yamt sync with head
 1.22.2.1 18-May-2014  rmind sync with head
 1.128 09-Apr-2023  riastradh uvm(9): KASSERT(A && B) -> KASSERT(A); KASSERT(B)
 1.127 12-Feb-2023  andvar s/strucure/structure/ and s/structues/structures/ in comments.
 1.126 01-Apr-2021  simonb Add a sysctl hashstat collector for ubchash.
 1.125 13-Mar-2021  skrll branches: 1.125.2;
Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats
 1.124 10-Nov-2020  chs remove someone's leftover debug printfs.
 1.123 18-Oct-2020  rin branches: 1.123.2;
PR kern/55658

Revert rev 1.122:
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/uvm/uvm_bio.c#rev1.122

If this commit is applied to NFS client, changes to files in client
side are sometimes invisible in server side, which results in file
corruption.

Demonstrated by test code provided by Anthony Mallet:
https://mail-index.netbsd.org/current-users/2020/10/17/msg039708.html

Whether the test case above passes or not depends on architectures
and size of NFS I/O specified by -r and -w options of mount_nfs(8)
(the default size is 32KB for x86 and 8KB for other archs).

Whereas it fails on amd64 and i386 with the default size, it passes
on other archs (aarch64, arm, alpha, m68k, and powerpc at least) with
their default. On most ports, it fails with some I/O sizes.

However, the condition for failure is still unclear; whereas it fails
with 2KB I/O size on amiga (m68k, 8KB page), it passes with same I/O
size on alpha (8KB page). It may depends on some VM parameters or
details in pmap implementation, or some race conditions are involved.

Great thanks to Anthony Mallet for providing the test code, and sorry
everyone for breakage.
 1.122 05-Oct-2020  rin PR kern/55658

ubc_fault_page(): Ignore PG_RDONLY flag and always pmap_enter() the page
with the permissions of the original access_type.

It is the file system's responsibility to allocate blocks that is being
modified by write(), before calling into UBC to fill the pages for that
range. KASSERT() is added there to confirm that no clean page is mapped
writable.

Fix infinite loop in uvm_fault_internal(), observed on 16KB-page systems,
where it continues to try to make a partially-backed page writable.

No regression in ATF and KASSERT() does not fire on several architectures,
as far as I can see.

Fix suggested by chs. Thanks!
 1.121 09-Jul-2020  rin PR kern/55467
tmpfs calls pmap_kenter_pa(9) with virtual address with page offset

Bisectioning revealed that the failure starts with this commit:

sys/fs/tmpfs/tmpfs_vnops.c rev 1.142:
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/fs/tmpfs/tmpfs_vnops.c#rev1.142

by which tmpfs became to use UBC_FAULTBUSY flag for ubc_uiomove(9).
If this flag is specified, pmap_kenter_pa(9) is called with virtual
address with page offset via ubc_alloc(9):

https://nxr.netbsd.org/xref/src/sys/uvm/uvm_bio.c#616

Most ports seem to neglect silently page offset of va argument for
pmap_kenter_pa(9). However, it causes KASSERT failure correctly for
powerpc/booke. So, truncate page offset there.

Now, tmpfs works just fine on evbppc-booke, and I've confirmed that
no new failures are detected by ATF.

Discussed with chs@. Thanks!
 1.120 09-Jul-2020  skrll Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
 1.119 08-Jul-2020  skrll Trailing whitespace
 1.118 25-Jun-2020  jdolecek make ubc_winshift / ubc_winsize constant, and based on whatever is bigger
of (1 << UBC_WINSHIFT, MAX_PAGE_SIZE)

given that default UBC_WINSHIFT is 13, this changes behaviour only
for mips and powerpc (BookE/OEA), which will now have twice as much
memory used for UBC windows; if this ever becomes a problem, it's
possible to reduce ubc_nwins in MD code similar to what is done on sparc

this eliminates variable-length arrays in ubc_fault(),
ubc_uiomove(), and ubc_zerorange() so that the stack usage can be
determined and checked in compile time
 1.117 25-May-2020  ad ubc_uiomove_direct(): if UBC_FAULTBUSY, the left-over portion of the final
page needs to be zeroed.
 1.116 24-May-2020  ad - ubc_uiomove(): Always use direct access in the UBC_FAULTBUSY case, since
it works basically the same way as !direct minus temporary mappings, and
there are no concurrency issues.

- ubc_alloc_direct(): In the PGO_OVERWRITE case blocks are allocated
beforehand. Avoid waking or activating pages unless needed.
 1.115 23-May-2020  ad - In ubc_alloc() take initial offset into account in the UBC_FAULTBUSY case
or one too few pages can be mapped.

- In ubc_release() with UBC_FAULTBUSY, chances are that pages are newly
allocated and freshly enqueued, so avoid uvm_pageactivate() if possible

- Keep track of the pages mapped in ubc_alloc() in an array on the stack,
and use this to avoid calling pmap_extract() in ubc_release().
 1.114 19-May-2020  ad PR kern/32166: pgo_get protocol is ambiguous
Also problems with tmpfs+nfs noted by hannken@.

Don't pass PGO_ALLPAGES to pgo_get, and ignore PGO_DONTCARE in the
!PGO_LOCKED case. In uao_get() have uvm_pagealloc() take care of page
zeroing and release busy pages on error.
 1.113 26-Apr-2020  thorpej Disable ubc_direct by default again. There are still stability issues
(e.g. panic during 2020.04.25.00.07.27 amd64 releng test run).
 1.112 24-Apr-2020  ad ubc_alloc_direct(): for a write make sure pages are always marked dirty
because there's no managed mapping.
 1.111 23-Apr-2020  ad Enable ubc_direct by default, but only on systems with no more than 2 CPUs
for now.
 1.110 23-Apr-2020  ad PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)

- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
somewhere. Use it to decide whether to do direct-mapped copy, rather than
poking around directly in the vnode in ubc_uiomove(), which is ugly and
doesn't work for tmpfs. It would be nicer to contain all this in UVM but
the filesystem provides the needed locking here (VV_MAPPED) and to
reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS(). Pass in UBC_ISMAPPED where
appropriate.
 1.109 23-Apr-2020  ad ubc_direct_release(): unbusy the pages directly since pg->interlock is
being taken.
 1.108 07-Apr-2020  ad branches: 1.108.2;
ubc_direct_release(): remove spurious call to uvm_pagemarkdirty().
 1.107 07-Apr-2020  ad PR kern/54759: vm.ubc_direct deadlock when read()/write() into mapping of itself

Prevent ubc_uiomove_direct() on mapped vnodes.
 1.106 17-Mar-2020  ad Tweak the March 14th change to make page waits interlocked by pg->interlock.
Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude
possible bugs.
 1.105 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.104 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.103 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.102 31-Dec-2019  ad branches: 1.102.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.101 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.100 07-Nov-2019  skrll Fix a UVMHIST_LOG format broken in 1.91
 1.99 09-Dec-2018  jdolecek for direct map case, avoid PGO_NOBLOCKALLOC when writing, it makes
genfs_getpages() return unallocated pages using the zero page and
PG_RDONLY; the old code relied on fault logic to get it allocated, which
the direct case can't rely on

instead just allocate the blocks right away; pass PGO_JOURNALLOCKED
so that code wouldn't try to take wapbl lock, this code path is called
with it already held

this should fix KASSERT() due to PG_RDONLY on write with wapbl

towards resolution of PR kern/53124
 1.98 20-Nov-2018  jdolecek need to use PGO_NOBLOCKALLOC also in ubc_alloc_direct() case, same
as non-direct code - otherwise the code tries to acquire the wapbl
lock again in genfs_getpages(), and panic due to locking against itself

towards PR kern/53124
 1.97 02-Jun-2018  chs branches: 1.97.2;
add missing boilerplate for UVMHIST.
 1.96 26-May-2018  jdolecek uvm_pageactivate() needs to be called _after_ code is done with the page, no reason
to bother pdaemon with PG_BUSY pages; also clear the PG_FAKE and PG_CLEAN after
we are done with the write

this does not make any difference on my machine, but maybe it might fix
the machine check panic on Martin's alpha

while here remove UBC_PARTIALOK handling from ubc_zeropage_direct(), just to be sure
it works exactly the same as the non-direct one
 1.95 19-May-2018  jdolecek change code to take advantage of direct map when available, avoiding the need
to map pages into kernel

this improves performance of UBC-based (read(2)/write(2)) I/O especially
for cached block I/O - sequential read on my NVMe goes from 1.7 GB/s to 1.9 GB/s
for non-cached, and from 2.2 GB/s to 5.6 GB/s for cached read

the new code is conditional now and off for now, so that it can be tested further;
can be turned on by adjusting ubc_direct variable to true

part of fix for PR kern/53124
 1.94 20-Apr-2018  jdolecek make ubc_alloc() and ubc_release() static, they should not be used
outside of ubc_uiomove()/ubc_zeropage(); for now mark as noinline
to keep them available as breakpoints
 1.93 26-Mar-2018  jdolecek mark ubc_winshift and ubc_winsize as __read_mostly, they are used often
so might benefit from cache placement
 1.92 09-Feb-2018  maxv branches: 1.92.2;
Use UVM_PROT_RW instead of UVM_PROT_ALL. This doesn't change anything,
since the protection code is not applied: the pages are manually kentered
as RW.

But fix it anyway, so that "pmap 0" does not say the map is executable.
 1.91 28-Oct-2017  pgoyette Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.90 01-Jun-2017  chs branches: 1.90.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.89 21-Mar-2017  ozaki-r Fix typo
 1.88 20-Mar-2017  kre Ugh. This stuff is disgusting. We really need an arch dependent
PRIxOFF (and PRIdOFF) to print off_t's in a way that matches the
arch's definition of off_t.

In the meantime fall back on %jx and an (intmax_t) cast. Ugly.
(And the way it is written is even uglier...)
 1.87 20-Mar-2017  kre Third time lucky...

Why is there no PRI[xd]OFF ? How are off_t's intended to be printed?

If a PRIxOFF gets added in some appropriate place, the XXX lines in this
commit can go away.

(I understand not having PRI[xd]VOFF).
 1.86 20-Mar-2017  kre One more (should have noticed last time) and this time fix the
format the way it should have been fixed, not just what was easiest...
 1.85 20-Mar-2017  kre Perhaps fix printf format for KASSERTMSG (unbreak i386 build maybe).
This can be revisited by anyone who wants to do things better...
 1.84 19-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT
 1.83 27-May-2015  rmind branches: 1.83.2; 1.83.4;
ubc_alloc: perform pmap_update() in the error path as we might have
removed the mapping.
 1.82 05-Sep-2014  matt branches: 1.82.2;
Don't nest structure definitions.
 1.81 07-Jul-2014  riastradh Initialize ubchist earlier.
 1.80 25-Oct-2013  martin branches: 1.80.2;
Mark a diagnostic-only variable
 1.79 27-Sep-2011  jym branches: 1.79.2; 1.79.12; 1.79.16;
Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html
 1.78 29-Jun-2011  hannken Remove dead uvm_vnp_zerorange() after bump to 5.99.54.
 1.77 19-Jun-2011  rmind - Fix a silly bug: remove umap from uobj in ubc_release() UBC_UNMAP case.
- Use UBC_WANT_UNMAP() consistently.

ARM (PMAP_CACHE_VIVT case) works again.
 1.76 18-Jun-2011  rmind - Move pre-check from uvm_obj_destroy() to ubc_purge(), keep it abstracted.
- Add comments noting the race between ubc_alloc() and ubc_purge().
 1.75 17-Jun-2011  hannken When ubc_alloc() reuses a cached mapping window remove the object from
the lists AFTER clearing its mapping.

Removes a race where uvm_obj_destroy() sees an empty uo_ubc list and
destroys the object before ubc_alloc() gets the objects lock to clear
the mapping.
 1.74 16-Jun-2011  hannken Rename uvm_vnp_zerorange(struct vnode *, off_t, size_t) to
ubc_zerorange(struct uvm_object *, off_t, size_t, int) changing
the first argument to an uvm_object and adding a flags argument.

Modify tmpfs_reg_resize() to zero the backing store (aobj) instead
of the vnode. Ubc_purge() no longer panics when unmounting tmpfs.

Keep uvm_vnp_zerorange() until the next kernel version bump.
 1.73 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.72 19-May-2011  rmind branches: 1.72.2;
ubc_release: use voff_t for offsets, rather than int. Constify.
Reviewed by matt@.
 1.71 30-Nov-2010  hannken branches: 1.71.2;
Always take the object lock before changing vmpage flags. Fixes a deadlock
where a thread is waiting on "genput" but the page in question is neither
BUSY nor WANTED.

No objections from tech-kern@.
 1.70 22-Jun-2010  rmind Keep the lock around pmap_update() where required. While fixing this
in ubc_fault(), rework logic to "remember" the last object of page and
reduce locking overhead, since in common case pages belong to one and
the same UVM object (but not always, therefore add a comment).

Unlocks before pmap_update(), on removal of mappings, might cause TLB
coherency issues, since on architectures like x86 and mips64 invalidation
IPIs are deferred to pmap_update(). Hence, VA space might be globally
visible before IPIs are sent or while they are still in-flight.

OK ad@.
 1.69 29-May-2010  rmind ubc_fault: split-off code part handling a single page into ubc_fault_page().
 1.68 07-Nov-2009  cegger branches: 1.68.2; 1.68.4;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.67 04-Aug-2009  pooka uvm_vnp_zerorange() logically and by implementation more a part of
ubc than uvm_vnode, so move it over.
 1.66 27-Nov-2008  pooka g/c #if 0'd ubc_flush()
 1.65 05-May-2008  ad branches: 1.65.6; 1.65.8; 1.65.10; 1.65.14;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.64 02-Jan-2008  ad branches: 1.64.6; 1.64.8; 1.64.10;
Merge vmlocking2 to head.
 1.63 01-Dec-2007  yamt branches: 1.63.2; 1.63.6;
constify pagerops.
 1.62 27-Jul-2007  yamt branches: 1.62.4; 1.62.6; 1.62.12; 1.62.14;
ubc_uiomove: add an "advice" argument rather than using UVM_ADV_RANDOM blindly.
 1.61 27-Jul-2007  yamt remove a debug printf.
 1.60 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.59 22-Jun-2007  yamt branches: 1.59.2;
ubc_alloc: break loans on UBC_FAULTBUSY.

it's necessary after recent file overwrite changes.
(http://mail-index.NetBSD.org/source-changes/2007/06/05/0014.html)
it should fix the problem reported by Sarton O'Brien on
current-users@/port-xen@.
(http://mail-index.NetBSD.org/current-users/2007/06/22/0001.html)
 1.58 05-Jun-2007  yamt improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.57 07-May-2007  yamt add an evcnt and some assertions.
 1.56 22-Feb-2007  thorpej branches: 1.56.4; 1.56.6;
TRUE -> true, FALSE -> false
 1.55 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.54 01-Nov-2006  yamt branches: 1.54.4;
remove some __unused from function parameters.
 1.53 19-Oct-2006  yamt add an assertion.
 1.52 12-Oct-2006  yamt move some knowledge about vnode into uvm_vnode.c.
 1.51 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.50 30-Sep-2006  yamt add ubc window hit/miss evcnts.
 1.49 30-Sep-2006  yamt ubc_fault: check UVM_OBJ_NEEDS_WRITEFAULT.
fix an assertion failure in genfs_putpages when using msdosfs.
(http://mail-index.NetBSD.org/tech-kern/2006/09/27/0002.html)
reported and tested by Darrin B.Jewell.
 1.48 03-Sep-2006  christos branches: 1.48.2; 1.48.4;
use c99 initializer.
 1.47 18-Aug-2006  yamt ubc_fault: fix a deadlock in the case of uvm_loanbreak() failure.
 1.46 03-May-2006  yamt ubc_fault: use PMAP_CANFAIL. pointed by Jed Davis on tech-kern@.
 1.45 13-Apr-2006  yamt ubc_fault: don't forget to clear PG_WANTED.
reported by Michael Lorenz on tech-kern@.
 1.44 22-Feb-2006  drochner branches: 1.44.2; 1.44.4; 1.44.6;
kill the "fault_type" argument to pager's pgo_fault() methods
it is never used
(and using it would comprise an abstraction violation imho)
 1.43 31-Jan-2006  yamt branches: 1.43.2; 1.43.4;
handle "strange" filesystems like layered filesystems and tmpfs,
where pgo_get returns pages which don't belong to the uobj.
also fix an XXX in uvm_loananon and lock-unlock mismatch in uvm_loanuobj.

PR/28372, PR/32665 (Alan Barrett).
 1.42 29-Nov-2005  yamt branches: 1.42.2;
merge yamt-readahead branch.
 1.41 23-Jul-2005  yamt branches: 1.41.6;
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.40 17-Jul-2005  yamt - introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.

- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.
 1.39 27-Jun-2005  thorpej branches: 1.39.2;
Use ANSI function decls.
 1.38 06-Jun-2005  yamt introduce a macro to initialize uvm_object and use it.
 1.37 26-Feb-2005  perry branches: 1.37.2;
nuke trailing whitespace
 1.36 17-Jan-2005  atatat branches: 1.36.2;
Convert the PMAP_PREFER() macro from two arguments (offset and hint)
to four (adding size and direction).

In order for topdown uvm to be an option on ports using PMAP_PREFER,
they will need to "prefer" lower addresses if topdown is being used.
Additionally, at least one port also needs to know the size.
 1.35 16-Jan-2005  yamt branches: 1.35.2;
remove no longer needed #include.
 1.34 15-Jan-2005  chs deal with alpha's architectural failing of not being able to operate on
memory quantities smaller than 32 bits.
 1.33 09-Jan-2005  chs adjust the UBC mapping code to support non-vnode uvm_objects.
this means we can no longer look at the vnode size to determine how many
pages to request in a fault, which is good since for NFS the size can change
out from under us on the server anyway. there's also a new flag UBC_UNMAP
for ubc_release(), so that the file system code can make the decision about
whether to cache mappings for files being used as executables.
 1.32 05-May-2004  yamt ubc_release: grab uobj's vmobjlock when calling uvm_page_unbusy().
 1.31 24-Mar-2004  junyoung branches: 1.31.2;
Nuke __P().
 1.30 05-Mar-2004  dbj add debugging assertion ensuring UBC_FAULTBUSY is only used with UBC_WRITE
 1.29 07-Jan-2004  yamt #if 0 out unused ubc_flush().
 1.28 03-May-2003  yamt branches: 1.28.2;
fix ubc pager to take care of loan_count.
 1.27 10-Mar-2003  thorpej For PMAP_CACHE_VIVT platforms, make UBC_RELEASE_UNMAP evaluate to TRUE,
and add a comment explaining why.

Reviewed by Chuq Silvers.
 1.26 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.25 27-Feb-2002  chs branches: 1.25.10;
honor the PG_RDONLY flag (so that NFS can clear the PG_NEEDCOMMIT flag
when page with it set is modified again). fixes PR 15733.
 1.24 15-Feb-2002  simonb Add a space after a comma in a few places (KNF).
 1.23 19-Jan-2002  chs add a new flag PMAP_CACHE_VIVT for the pmap to inform the MI code that
that the cache is virtually-indexed and virtually-tagged (such as on the ARM),
and use this flag in the UBC code to be more friendly to those caches.
 1.22 19-Nov-2001  enami Zero clear an array of vm_page * before passing it to VOP_GETPAGES().
 1.21 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.20 16-Oct-2001  chs branches: 1.20.2;
it is with great chagrin that I must fix yet another 64-bit math bug.
 1.19 28-Sep-2001  chs don't depend on other headers to include sys/proc.h for us.
 1.18 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.17 10-Sep-2001  chris Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.
 1.16 18-Jul-2001  thorpej branches: 1.16.2;
bzero -> memset
 1.15 13-Jun-2001  simonb branches: 1.15.2;
Add a sanity check for ubc_winshift.
 1.14 26-May-2001  chs replace vm_page_t with struct vm_page *.
 1.13 25-May-2001  chs remove trailing whitespace.
 1.12 24-Apr-2001  thorpej Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.
 1.11 19-Mar-2001  chs change uvm_winsize to uvm_winshift so that we can avoid division
by a non-constant value.
 1.10 17-Mar-2001  chs return the real error from VOP_GETPAGES().
 1.9 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.8 10-Mar-2001  chs eliminate the VM_PAGER_* error codes in favor of the traditional E* codes.
the mapping is:

VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART

for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
 1.7 02-Feb-2001  enami branches: 1.7.2;
Explicitly panic if failed to allocate some memory during initialization.
 1.6 27-Dec-2000  chs fix some types so that files larger than 4GB work.
 1.5 27-Dec-2000  chs VOP_GETPAGES() returns an E* error code, not a VM_PAGER_* error code.
 1.4 21-Dec-2000  enami s/UBC_WINSIZE/ubc_winsize/g except the variable initialization.
 1.3 10-Dec-2000  chs we don't need VM_PROT_EXECUTE for UBC mappings.
 1.2 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.1 09-Nov-1998  chs branches: 1.1.2; 1.1.4; 1.1.6;
file uvm_bio.c was initially added on branch chs-ubc.
 1.1.6.6 27-Mar-2001  bouyer Sync with HEAD.
 1.1.6.5 12-Mar-2001  bouyer Sync with HEAD.
 1.1.6.4 11-Feb-2001  bouyer Sync with HEAD.
 1.1.6.3 05-Jan-2001  bouyer Sync with HEAD
 1.1.6.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.1.6.1 08-Dec-2000  bouyer Sync with HEAD.
 1.1.4.6 11-Aug-1999  chs add a few casts to play better with 64bit vnode offsets.
 1.1.4.5 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.1.4.4 31-Jul-1999  chs in ubc_fault(), call VOP_GETPAGES() directly instead of going thru pgo_get().
also, we no longer to play games with the vm size of the file for nfs
(we actually need to do this for all filesystems, but not here).
 1.1.4.3 04-Jul-1999  chs support VACs better by having multiple inactive queues.
when creating a mapping, use the queue with the right alignment skew
for the requested offset. this also supports any size mapping windows.
when reading from a VTEXT vnode, flush the mapping in ubc_release().
 1.1.4.2 21-Jun-1999  thorpej Make this compile in the face of some UVM interface changes, and kill
void * arithmetic.
 1.1.4.1 21-Jun-1999  thorpej Chuq apparently forgot to place this on the ubc2 branch.
 1.1.2.5 30-May-1999  chs in ubc_flush(), also flush any pmap mappings in ranges we flush.
 1.1.2.4 30-Apr-1999  chs change ubc_alloc()'s length arg to be a pointer instead of the value.
the pointed-to value is the total desired length on input,
and is updated to the length that will fit in the returned window.
this allows callers of ubc_alloc() to be ignorant of the window size.
 1.1.2.3 09-Apr-1999  chs minor cleanup.
 1.1.2.2 25-Feb-1999  chs rename nubc to ubc_nwins. other cleanup.
 1.1.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.7.2.10 18-Oct-2002  nathanw Catch up to -current.
 1.7.2.9 28-Feb-2002  nathanw Catch up to -current.
 1.7.2.8 08-Jan-2002  nathanw Catch up to -current.
 1.7.2.7 14-Nov-2001  nathanw Catch up to -current.
 1.7.2.6 22-Oct-2001  nathanw Catch up to -current.
 1.7.2.5 08-Oct-2001  nathanw Catch up to -current.
 1.7.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.7.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.7.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.7.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.15.2.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.15.2.5 16-Mar-2002  jdolecek Catch up with -current.
 1.15.2.4 11-Feb-2002  jdolecek Sync w/ -current.
 1.15.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.15.2.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.15.2.1 03-Aug-2001  lukem update to -current
 1.16.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.20.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.25.10.1 02-Jun-2003  tron Pull up revision 1.27 (requested by thorpej in ticket #1207):
For PMAP_CACHE_VIVT platforms, make UBC_RELEASE_UNMAP evaluate to TRUE,
and add a comment explaining why.
Reviewed by Chuq Silvers.
 1.28.2.7 11-Dec-2005  christos Sync with head.
 1.28.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.28.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.28.2.4 17-Jan-2005  skrll Sync with HEAD.
 1.28.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.28.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.28.2.1 03-Aug-2004  skrll Sync with HEAD
 1.31.2.1 10-May-2004  tron Pull up revision 1.32 (requested by yamt in ticket #271):
ubc_release: grab uobj's vmobjlock when calling uvm_page_unbusy().
 1.35.2.2 29-Apr-2005  kent sync with -current
 1.35.2.1 16-Jan-2005  kent file uvm_bio.c was added on branch kent-audio2 on 2005-04-29 11:29:40 +0000
 1.36.2.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.37.2.2 28-Jul-2006  tron Apply patch (requested by jld in ticket #1323):
Avoid a panic in page fault handling that can occur under low-memory
conditions.
 1.37.2.1 24-Aug-2005  riz branches: 1.37.2.1.2;
Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.37.2.1.2.1 28-Jul-2006  tron Apply patch (requested by jld in ticket #1323):
Avoid a panic in page fault handling that can occur under low-memory
conditions.
 1.39.2.6 21-Jan-2008  yamt sync with head
 1.39.2.5 07-Dec-2007  yamt sync with head
 1.39.2.4 03-Sep-2007  yamt sync with head.
 1.39.2.3 26-Feb-2007  yamt sync with head.
 1.39.2.2 30-Dec-2006  yamt sync with head.
 1.39.2.1 21-Jun-2006  yamt sync with head.
 1.41.6.1 19-Nov-2005  yamt - as read-ahead context is per-vnode now,
there are less reasons to make VOP_READ call uvm_ra_request explicitly.
move it to pager (uvn_get) so that it can handle accesses via mmap as well.
- pass advice to pager via ubc.
- tweak DPRINTF.

XXX can be disturbed by PGO_LOCKED.

XXX it's controversial where it should be done.
(uvm_fault, uvn_get or genfs_getpages.)
 1.42.2.2 01-Mar-2006  yamt sync with head.
 1.42.2.1 01-Feb-2006  yamt sync with head.
 1.43.4.2 01-Jun-2006  kardel Sync with head.
 1.43.4.1 22-Apr-2006  simonb Sync with head.
 1.43.2.1 09-Sep-2006  rpaulo sync with head
 1.44.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.44.4.2 11-May-2006  elad sync with head
 1.44.4.1 19-Apr-2006  elad oops - *really* sync to head this time.
 1.44.2.3 14-Sep-2006  yamt sync with head.
 1.44.2.2 03-Sep-2006  yamt sync with head.
 1.44.2.1 24-May-2006  yamt sync with head.
 1.48.4.2 10-Dec-2006  yamt sync with head.
 1.48.4.1 22-Oct-2006  yamt sync with head
 1.48.2.1 18-Nov-2006  ad Sync with head.
 1.54.4.2 17-May-2007  yamt sync with head.
 1.54.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.56.6.1 11-Jul-2007  mjf Sync with head.
 1.56.4.10 20-Aug-2007  ad Sync with HEAD.
 1.56.4.9 27-Jul-2007  yamt revert the previous as it was committed mistakenly. (wrong branch)
 1.56.4.8 27-Jul-2007  yamt remove a debug printf.
 1.56.4.7 18-Jul-2007  ad Fix a couple of deadlocks I introduced.
 1.56.4.6 15-Jul-2007  ad Sync with head.
 1.56.4.5 15-Jul-2007  ad Sync with head.
 1.56.4.4 09-Jun-2007  ad Sync with head.
 1.56.4.3 08-Jun-2007  ad Sync with head.
 1.56.4.2 05-Apr-2007  ad - Put a per-LWP lock around swapin / swapout.
- Replace use of lockmgr().
- Minor locking fixes and assertions.
- uvm_map.h no longer pulls in proc.h, etc.
- Use kpause where appropriate.
 1.56.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.59.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.62.14.2 27-Jul-2007  yamt ubc_uiomove: add an "advice" argument rather than using UVM_ADV_RANDOM blindly.
 1.62.14.1 27-Jul-2007  yamt file uvm_bio.c was added on branch matt-mips64 on 2007-07-27 09:50:38 +0000
 1.62.12.2 18-Feb-2008  mjf Sync with HEAD.
 1.62.12.1 08-Dec-2007  mjf Sync with HEAD.
 1.62.6.1 09-Jan-2008  matt sync with HEAD
 1.62.4.1 03-Dec-2007  joerg Sync with HEAD.
 1.63.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.63.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.64.10.5 11-Aug-2010  yamt sync with head.
 1.64.10.4 11-Mar-2010  yamt sync with head
 1.64.10.3 19-Aug-2009  yamt sync with head.
 1.64.10.2 04-May-2009  yamt sync with head.
 1.64.10.1 16-May-2008  yamt sync with head.
 1.64.8.1 18-May-2008  yamt sync with head.
 1.64.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.64.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.65.14.1 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.65.10.1 21-Nov-2010  riz Pull up following revision(s) (requested by rmind in ticket #1421):
sys/uvm/uvm_bio.c: revision 1.70
sys/uvm/uvm_map.c: revision 1.292
sys/uvm/uvm_pager.c: revision 1.98
sys/uvm/uvm_fault.c: revision 1.175
sys/uvm/uvm_bio.c: revision 1.69
ubc_fault: split-off code part handling a single page into ubc_fault_page().
Keep the lock around pmap_update() where required. While fixing this
in ubc_fault(), rework logic to &quot;remember&quot; the last object of page and
reduce locking overhead, since in common case pages belong to one and
the same UVM object (but not always, therefore add a comment).
Unlocks before pmap_update(), on removal of mappings, might cause TLB
coherency issues, since on architectures like x86 and mips64 invalidation
IPIs are deferred to pmap_update(). Hence, VA space might be globally
visible before IPIs are sent or while they are still in-flight.
OK ad@.
 1.65.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.65.6.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.68.4.8 31-May-2011  rmind sync with head
 1.68.4.7 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.68.4.6 03-Jul-2010  rmind sync with head
 1.68.4.5 30-May-2010  rmind sync with head
 1.68.4.4 26-Apr-2010  rmind Add ubc_purge() and purge/deassociate any related UBC entries during
object (usually, vnode) destruction. Since locking (and thus object)
is required to enter/remove mappings - object is not allowed anymore
to disappear with any UBC entries left.

From original patch by ad@ with some modifications.
 1.68.4.3 25-Apr-2010  rmind ubc_alloc: when replacing a cache entry, lock the old object from which we
are deassociating and removing the old mapping.
 1.68.4.2 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.68.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.68.2.14 21-Nov-2010  uebayasi Rename PGO_ZERO as PGO_HOLE, and s/uvm_page_zeropage/uvm_page_holepage/.
 1.68.2.13 21-Nov-2010  uebayasi Resurrect PGO_ZERO support.

When vnode pager encounters hole pages in XIP'ed vnodes, it fills
page slots with PGO_ZERO and returns them back to the caller (fault
handler). Fault handlers are responsible to check page slots and
redirect PGO_ZERO to the single "zero page" allocated by calling
uvm_page_zeropage_alloc(9).

The zero page is wired, read-only (PG_RDONLY) page. It's shared
by multiple vnodes, it has no single owner.

XIP'ed vnodes are supposed to be "stable" during I/O (unlocked).
Because XIP'ed mounts are always read-only. There's no chance to
change mappings of XIP'ed vnodes and their XIP'ed pages. Thus the
cached uobj is reused after pgo_get() for PGO_ZERO.

(Do we need a new concept of "read-only UVM object"?)
 1.68.2.12 04-Nov-2010  uebayasi Split physical device segment pages from "managed" to "managed
device". Cache that information as a flag PG_DEVICE so that callers
don't need to walk physsegs everytime.

Remove PQ_FIXED, which means that page daemon doesn't need to know
device segment pages at all. But still fault handlers need to know
them.

I think this is what I can do best now.
 1.68.2.11 17-Aug-2010  uebayasi Sync with HEAD.
 1.68.2.10 22-Jul-2010  uebayasi s/PG_XIP/PQ_FIXED/, meaning that the fault handler sees XIP pages as
"fixed", and doesn't pass them to paging activity.

("XIP" is a vnode specific knowledge. It was wrong that the fault
handler had to know such a special thing.)
 1.68.2.9 15-Jul-2010  uebayasi Rename PG_DIRECT to PG_XIP. PG_XIP is marked to XIP vnode pages.
 1.68.2.8 13-Jul-2010  uebayasi Reduce more diffs from the original.
 1.68.2.7 12-Jul-2010  uebayasi Reduce more diff by backing out XIP page specific code. Allow XIP pages
to be loaned.
 1.68.2.6 09-Jul-2010  uebayasi Mark XIP pages as PG_CLEAN and/or PG_BUSY when appropriate. Protect
vnode lock when vm_page::flags is manipulated.
 1.68.2.5 08-Jul-2010  uebayasi Mark XIP pages as PG_RDONLY.
 1.68.2.4 07-Jul-2010  uebayasi Clean up; merge options DIRECT_PAGE into options XIP.
 1.68.2.3 31-May-2010  uebayasi Re-define the definition of "device page"; device pages are pages of
device memory. Pages which don't have vm_page (== can't be used for
generic use), but whose PV are tracked, are called "direct pages" from
now.
 1.68.2.2 23-Feb-2010  uebayasi ubc_alloc: Don't forget taking the parent's vmobjlock in device page cases.
 1.68.2.1 12-Feb-2010  uebayasi Teach device page handling.
 1.71.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.72.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.79.16.1 18-May-2014  rmind sync with head
 1.79.12.2 03-Dec-2017  jdolecek update from HEAD
 1.79.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.79.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.79.2.2 17-Feb-2012  yamt byebye PG_HOLE as it turned out to be unnecessary.
 1.79.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.80.2.1 10-Aug-2014  tls Rebase.
 1.82.2.2 28-Aug-2017  skrll Sync with HEAD
 1.82.2.1 06-Jun-2015  skrll Sync with HEAD
 1.83.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.83.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.83.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.90.2.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.92.2.6 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.92.2.5 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.92.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.92.2.3 21-May-2018  pgoyette Sync with HEAD
 1.92.2.2 22-Apr-2018  pgoyette Sync with HEAD
 1.92.2.1 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.97.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.97.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.97.2.1 10-Jun-2019  christos Sync with HEAD
 1.102.2.2 29-Feb-2020  ad Sync with head.
 1.102.2.1 17-Jan-2020  ad Sync with head.
 1.108.2.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.123.2.2 03-Apr-2021  thorpej Sync with HEAD.
 1.123.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.125.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.9 10-Aug-2025  andvar Fix few typos in comments.
 1.8 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.7 17-Feb-2019  rin branches: 1.7.6;
VM_MAXUSER_ADDRESS can be undefined when compiling module/coredump;
it varies between machines for evbppc (and possibly evbppc64).
 1.6 07-Jan-2014  dsl branches: 1.6.30;
Re-instate the zero length sections in elf core dumps (they probably help
describe the process memory layout).
Fudge the a.out core code to not dump the entire contents.
I'm not sue that anything can read a.out core files - more progress might
be made on such dumps by converting the a.out file to elf!
 1.5 03-Jan-2014  dsl There is no need for uvm_coredump_walkmap() to explicity pass the proc_t
pointer to the calller's function.
If the code needs the process its address can be placed in the caller's
cookie.
 1.4 03-Jan-2014  dsl Minor changes to the process coredump code.
- Add some extra comments.
- Add some XXX comments because the process state might not be stable,
- Add uvm_coredump_count_segs() to simplify the calling code.
- uvm code now only returns non-empty sections/segments.
- Put the 'iocookie' into the 'cookie' block passed to uvm_coredump_walkmap()
instead of passing it through as an additional parameter.
amd64 can still generate core dumps that gdb can read.
 1.3 01-Jan-2014  dsl Change the type of the 'cookie' that holds the state of the core dump file
from 'void *' to the actual type 'struct coredump_iostate *'.
In most of the code the contents of the structure are still unknown.
This just stops the wrong type of pointer being passed to the 'void *'
parameter.
I hope I've found everything, amd64 GENERIC and i386 GENERIC & ALL compile.
 1.2 02-Feb-2011  chuck branches: 1.2.4; 1.2.14; 1.2.18;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.
 1.1 19-Nov-2008  ad branches: 1.1.4; 1.1.6; 1.1.8; 1.1.12; 1.1.16; 1.1.18; 1.1.20;
Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime
 1.1.20.1 08-Feb-2011  bouyer Sync with HEAD
 1.1.18.1 06-Jun-2011  jruoho Sync with HEAD.
 1.1.16.1 05-Mar-2011  rmind sync with head
 1.1.12.2 04-May-2009  yamt sync with head.
 1.1.12.1 19-Nov-2008  yamt file uvm_coredump.c was added on branch yamt-nfs-mp on 2009-05-04 08:14:39 +0000
 1.1.8.2 19-Jan-2009  skrll Sync with HEAD.
 1.1.8.1 19-Nov-2008  skrll file uvm_coredump.c was added on branch nick-hppapmap on 2009-01-19 13:20:36 +0000
 1.1.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.1.6.1 19-Nov-2008  mjf file uvm_coredump.c was added on branch mjf-devfs2 on 2009-01-17 13:29:43 +0000
 1.1.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.1.4.1 19-Nov-2008  haad file uvm_coredump.c was added on branch haad-dm on 2008-12-13 01:15:42 +0000
 1.2.18.1 18-May-2014  rmind sync with head
 1.2.14.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.2.4.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.30.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.6.30.1 10-Jun-2019  christos Sync with HEAD
 1.7.6.1 29-Feb-2020  ad Sync with head.
 1.16 27-Dec-2019  ad Redo the page allocator to perform better, especially on multi-core and
multi-socket systems. Proposed on tech-kern. While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
 1.15 17-May-2011  mrg branches: 1.15.56;
move and rename the uvm history code out of uvm_stat to "kernhist".

rename "UVMHIST" option to enable the uvm histories.

TODO:
- make UVMHIST properly depend upon KERNHIST
- enable dynamic registration of histories. this is mostly just
allocating something in a bitmap, and is only for viewing multiple
histories in a merged form.


tested on amd64 and sparc64.
 1.14 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.13 03-May-2009  pooka branches: 1.13.4; 1.13.6; 1.13.8;
Include some debug print routines if DEBUGPRINT is defined. This
way they can be included without having to include DDB.
(arguably all print routines should be behind #ifdef DEBUGPRINT
and options DDB should define that macro, but I'll tackle that later)
 1.12 08-Aug-2008  skrll branches: 1.12.8;
Make "show uvmhist" available to all arches (not just sparc*) in ddb.
 1.11 21-Feb-2007  thorpej branches: 1.11.38; 1.11.42; 1.11.44; 1.11.48;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.10 19-Feb-2006  bjh21 branches: 1.10.20;
Add a "show all pages" command to DDB which prints one line per physical
page in the system. Useful for getting some idea where all your memory's
gone, at least on a sufficiently small system.
 1.9 11-Dec-2005  christos branches: 1.9.2; 1.9.4; 1.9.6;
merge ktrace-lwp.
 1.8 24-Mar-2004  junyoung branches: 1.8.16;
Nuke __P().
 1.7 02-Jun-2001  chs branches: 1.7.22;
replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.6 27-Apr-2001  marcus STDC cleanup: extra token not allowed after #endif.
 1.5 25-Nov-2000  chs branches: 1.5.2;
lots of cleanup:
use queue.h macros and KASSERT().
address amap offsets in pages instead of bytes.
make amap_ref() and amap_unref() take an amap, offset and length
instead of a vm_map_entry_t.
improve whitespace and comments.
 1.4 24-Nov-2000  chs add ddb commands "show uvmexp" and "show ncache".
the former used to be "call uvm_dump", the latter is new.
 1.3 21-Jun-1999  thorpej branches: 1.3.2;
Protect prototypes, certain macros, and inlines from userland.
 1.2 25-Mar-1999  mrg branches: 1.2.4;
remove now >1 year old pre-release message.
 1.1 04-Jul-1998  jonathan defopt DDB.
 1.2.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.3.2.1 08-Dec-2000  bouyer Sync with HEAD.
 1.5.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.7.22.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.7.22.2 18-Sep-2004  skrll Sync with HEAD.
 1.7.22.1 03-Aug-2004  skrll Sync with HEAD
 1.8.16.2 26-Feb-2007  yamt sync with head.
 1.8.16.1 21-Jun-2006  yamt sync with head.
 1.9.6.1 22-Apr-2006  simonb Sync with head.
 1.9.4.1 09-Sep-2006  rpaulo sync with head
 1.9.2.1 01-Mar-2006  yamt sync with head.
 1.10.20.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.11.48.1 19-Oct-2008  haad Sync with HEAD.
 1.11.44.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.11.42.1 04-May-2009  yamt sync with head.
 1.11.38.1 28-Sep-2008  mjf Sync with HEAD.
 1.12.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.13.8.1 08-Feb-2011  bouyer Sync with HEAD
 1.13.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.13.4.2 31-May-2011  rmind sync with head
 1.13.4.1 05-Mar-2011  rmind sync with head
 1.15.56.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.80 07-Jul-2022  riastradh uvm: CTASSERT about MIN_PAGE_SIZE, which is constant.
 1.79 07-Jul-2022  rin Convert CTASSERT(9) for PAGE_{SIZE,MASK} into KASSERT(9).

They are not compile-time constants for sparc.
 1.78 06-Jul-2022  riastradh kern: Work around spurious -Wtype-limits warnings.

This useless garbage warning is apparently designed to make it
painful to write portable safe arithmetic and I think we ought to
just disable it.
 1.77 06-Jul-2022  riastradh mmap(2): Guarantee two's-complement wraparound for D_NEGOFFSAFE.

XXX Not sure this should be allowed at all, but this way we don't
change the semantics of the existing code which was written under
essentially the assumption of -fwrapv.
 1.76 06-Jul-2022  riastradh uvm/uvm_device.c: Sprinkle KNF.
 1.75 06-Jul-2022  riastradh mmap(2): Prohibit overflowing offsets for non-D_NEGOFFSAFE devices.

Reported-by: syzbot+d5a96e7a0ebbd0b76dfc@syzkaller.appspotmail.com
 1.74 06-Jul-2022  riastradh uvm(9): fo_mmap caller guarantees positive size.

No functional change intended, just sprinkling assertions to make it
clearer.
 1.73 28-Mar-2022  riastradh driver(9): New types dev_*_t for device driver devsw operations.

These will serve to replace the archaic and kludgey dev_type_* macros
which should've been typedefs all along.
 1.72 13-Mar-2021  skrll Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats
 1.71 09-Jul-2020  skrll branches: 1.71.2;
Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
 1.70 24-Feb-2020  rin 0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.
 1.69 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.68 22-Feb-2020  chs do not wait for memory in pgo_fault methods, just return ENOMEM
and let the uvm_fault code wait if it is appropriate.
 1.67 01-Dec-2019  ad branches: 1.67.2;
__cacheline_aligned on a lock.
 1.66 28-Oct-2017  pgoyette branches: 1.66.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.65 17-Dec-2016  riastradh branches: 1.65.6; 1.65.8;
Omit needless nullmmap.

Convert the one user of it to nommap. No functional change to the
device driver, since uvm interpreted nullmmap just like nommap.

This slightly changes the uvm ABI so that the function pointer nullop
is no longer interpreted as non-mmappable. I do hereby declare that
I am surfing the kernel version bump from a few hours ago.
 1.64 14-Dec-2014  chs branches: 1.64.2;
add a new "fo_mmap" fileops method to allow use of arbitrary uvm_objects for
mappings of file objects. move vnode-specific details of mmap()ing a vnode
from uvm_mmap() to the new vnode-specific vn_mmap(). add new uvm_mmap_dev()
and uvm_mmap_anon() convenience functions for mapping character devices
and anonymous memory, and replace all other calls to uvm_mmap() with those.
use the new fileop in drm2 so that libdrm can use mmap() to map things
like on other platforms (instead of the ioctl that we have used so far).
 1.63 27-Jan-2012  para branches: 1.63.6; 1.63.22; 1.63.24;
extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.62 12-Jun-2011  rmind branches: 1.62.2; 1.62.6;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.61 23-Apr-2011  rmind branches: 1.61.2;
Replace "malloc" in comments, remove unnecessary header inclusions.
 1.60 12-Feb-2011  jmcneill need uvm_pmap.h for pmap_mmap_flags definition
 1.59 11-Feb-2011  jmcneill add optional MD pmap_mmap_flags macro for passing flags between cdev_mmap
and pmap_enter, ok matt@
 1.58 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.57 05-Feb-2010  uebayasi branches: 1.57.2; 1.57.4; 1.57.6; 1.57.8;
vnode.h is not used here.
 1.56 20-Jun-2009  mrg add a workaround for drm:

for device mmap()'s, if the D_NEGOFFSAFE flag is set, do not check
if the offset is negative.

this should go away with the test itself when all drivers are audited
and checked to not fail with negative offsets.
 1.55 17-Dec-2008  cegger branches: 1.55.2;
kill MALLOC and FREE macros.
 1.54 04-Jun-2008  ad branches: 1.54.6; 1.54.8; 1.54.14;
udv_fault: pmap_update before releasing locks.
 1.53 02-Jan-2008  ad branches: 1.53.6; 1.53.8; 1.53.10; 1.53.12;
Merge vmlocking2 to head.
 1.52 08-Dec-2007  ad branches: 1.52.4;
Merge from vmlocking2 (use cdev_mmap()).
 1.51 01-Dec-2007  yamt branches: 1.51.2;
constify pagerops.
 1.50 24-Jun-2007  christos branches: 1.50.6; 1.50.8; 1.50.14;
handle UVM_UNKNOWN_OFFSET.
 1.49 22-Feb-2007  thorpej branches: 1.49.4; 1.49.6; 1.49.8;
TRUE -> true, FALSE -> false
 1.48 03-Sep-2006  christos branches: 1.48.8;
use c99 initializers
 1.47 22-Feb-2006  drochner branches: 1.47.2;
kill the "fault_type" argument to pager's pgo_fault() methods
it is never used
(and using it would comprise an abstraction violation imho)
 1.46 11-Dec-2005  christos branches: 1.46.2; 1.46.4; 1.46.6;
merge ktrace-lwp.
 1.45 27-Jun-2005  thorpej branches: 1.45.2;
Small whitespace tweak.
 1.44 27-Jun-2005  thorpej Use ANSI function decls.
 1.43 06-Jun-2005  yamt introduce a macro to initialize uvm_object and use it.
 1.42 24-Mar-2004  junyoung Nuke __P().
 1.41 06-Sep-2002  gehenna branches: 1.41.6;
Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.40 28-Feb-2002  christos branches: 1.40.8;
use the <sys/conf.h> macro to get the mmap footprint.
 1.39 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.38 15-Sep-2001  chs branches: 1.38.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.37 10-Sep-2001  chris Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.
 1.36 26-May-2001  chs branches: 1.36.2; 1.36.4;
replace vm_page_t with struct vm_page *.
 1.35 26-May-2001  chs replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.
 1.34 25-May-2001  chs remove trailing whitespace.
 1.33 24-Apr-2001  thorpej Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.
 1.32 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.31 10-Mar-2001  chs eliminate the VM_PAGER_* error codes in favor of the traditional E* codes.
the mapping is:

VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART

for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
 1.30 25-Nov-2000  chs branches: 1.30.2;
lots of cleanup:
use queue.h macros and KASSERT().
address amap offsets in pages instead of bytes.
make amap_ref() and amap_unref() take an amap, offset and length
instead of a vm_map_entry_t.
improve whitespace and comments.
 1.29 24-Nov-2000  chs g/c unused pager ops "asyncget" and "aiodone".
 1.28 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.27 27-Jun-2000  simonb In udv_fault(), use an off_t for curr_offset so that the offset passed
to d_mmap isn't truncated on 64 bit architectures.
 1.26 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.25 26-Jun-2000  simonb Change the kernel mmap interface so that the offset to map is an
"off_t" and the return value is a "paddr_t" to allow mappings
at offsets past 2^31 bytes. Somewhat inspired by FreeBSD, which
only changed the offset to a "vm_offset_t".

Includes updates for the i386, pc532 and sh3 mmmmap from Jason Thorpe.
 1.24 24-Jun-2000  pk uvm_detach: eliminate degenerate loop construction.
 1.23 24-Jun-2000  pk Insert two missing `simple_unlock()'s' in udv_detach().
 1.22 28-May-2000  drochner branches: 1.22.2;
Don't silently truncate the voff_t offset to vaddr_t when passing it to
udv_attach. Pass the whole voff_t instead and do an explicite overflow
check before it is passed to the device's mmap handler (as "int", sadly).
 1.21 03-Apr-2000  chs branches: 1.21.2;
remove the "shareprot" pagerop. it's not needed anymore since
share maps are long gone.
 1.20 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.19 26-Mar-2000  kleink Kill duplicate udv_attach() prototype; it's a public interface, and declared
in uvm_device.h.
 1.18 13-Nov-1999  thorpej Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.
 1.17 24-Oct-1999  ross Patch from chuq for uvm r/w map oscillation bug.
Fixes the XalphaNetBSD slowdown.
 1.16 08-Apr-1999  drochner branches: 1.16.2; 1.16.4; 1.16.6;
sanity: use ';' to separate statements
 1.15 26-Mar-1999  mycroft branches: 1.15.2; 1.15.4;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().
 1.14 25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.13 24-Mar-1999  cgd modify udv_attach() and its caller (uvm_mmap()) so that it's passed the
offset and size of the requested region to be mapped, so that the
udv_attach() can use the device d_mmap() entry to check mappability
of the requested region.
 1.12 24-Mar-1999  cgd after discussion with chuck, nuke pgo_attach from uvm_pagerops
 1.11 19-Nov-1998  mrg check the return value of d_mmap before pmap_phys_address() gets hold of it.
 1.10 11-Oct-1998  chuck remove unused share map code from UVM:
- udv_fault() no longer has to worry about share map address translations
on device faults. simplify code.
 1.9 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.8 05-May-1998  kleink branches: 1.8.2;
Remove inclusions of syscall (and syscall argument) related header files;
we don't need them here.
 1.7 09-Mar-1998  mrg KNF.
 1.6 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.5 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 07-Feb-1998  mrg restore rcsids
 1.3 07-Feb-1998  chs rearrange a bit for clarity.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.8.2.2 08-Aug-1998  eeh Revert cdevsw mmap routines to return int.
 1.8.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.15.4.2 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.15.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.15.2.1 26-Oct-1999  he Pull up revision 1.17 (requested by ross):
Bugfix for device mmap fault handler, fixes serious performance
problem with alpha X server.
 1.16.6.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.16.4.1 15-Nov-1999  fvdl Sync with -current
 1.16.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.16.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.16.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.16.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.21.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.22.2.3 30-Jun-2000  simonb Pull up mmap paddr_t/off_t changes from trunk.
 1.22.2.2 24-Jun-2000  thorpej Pull up rev. 1.24:
uvm_detach: eliminate degenerate loop construction.
 1.22.2.1 24-Jun-2000  thorpej Pull up rev. 1.23:
Insert two missing `simple_unlock()'s' in udv_detach().
 1.30.2.6 17-Sep-2002  nathanw Catch up to -current.
 1.30.2.5 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.30.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.30.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.30.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.30.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.36.4.2 01-Oct-2001  fvdl Catch up with -current.
 1.36.4.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.36.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.36.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.36.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.36.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.38.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.40.8.1 16-May-2002  gehenna Replace the direct-access to devsw table with calling devsw APIs.
 1.41.6.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.41.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.41.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.41.6.1 03-Aug-2004  skrll Sync with HEAD
 1.45.2.6 21-Jan-2008  yamt sync with head
 1.45.2.5 07-Dec-2007  yamt sync with head
 1.45.2.4 03-Sep-2007  yamt sync with head.
 1.45.2.3 26-Feb-2007  yamt sync with head.
 1.45.2.2 30-Dec-2006  yamt sync with head.
 1.45.2.1 21-Jun-2006  yamt sync with head.
 1.46.6.1 22-Apr-2006  simonb Sync with head.
 1.46.4.1 09-Sep-2006  rpaulo sync with head
 1.46.2.1 01-Mar-2006  yamt sync with head.
 1.47.2.1 14-Sep-2006  yamt sync with head.
 1.48.8.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.49.8.1 09-Dec-2007  reinoud Pullup to HEAD
 1.49.6.1 11-Jul-2007  mjf Sync with head.
 1.49.4.3 21-Aug-2007  yamt destroy vmobjlock.
 1.49.4.2 20-Aug-2007  ad Sync with HEAD.
 1.49.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.50.14.3 18-Feb-2008  mjf Sync with HEAD.
 1.50.14.2 27-Dec-2007  mjf Sync with HEAD.
 1.50.14.1 08-Dec-2007  mjf Sync with HEAD.
 1.50.8.1 09-Jan-2008  matt sync with HEAD
 1.50.6.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.50.6.1 03-Dec-2007  joerg Sync with HEAD.
 1.51.2.3 08-Dec-2007  ad Sync with head.
 1.51.2.2 08-Dec-2007  ad Fix merge error.
 1.51.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.52.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.53.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.53.10.3 11-Mar-2010  yamt sync with head
 1.53.10.2 18-Jul-2009  yamt sync with head.
 1.53.10.1 04-May-2009  yamt sync with head.
 1.53.8.1 17-Jun-2008  yamt sync with head.
 1.53.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.53.6.1 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.54.14.1 21-Apr-2010  matt sync to netbsd-5
 1.54.8.1 23-Jun-2009  snj Pull up following revision(s) (requested by mrg in ticket #826):
sys/sys/conf.h: revision 1.135
sys/uvm/uvm_device.c: revision 1.56
add a workaround for drm:
for device mmap()'s, if the D_NEGOFFSAFE flag is set, do not check
if the offset is negative.
this should go away with the test itself when all drivers are audited
and checked to not fail with negative offsets.
 1.54.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.55.2.1 23-Jul-2009  jym Sync with HEAD.
 1.57.8.2 17-Feb-2011  bouyer Sync with HEAD
 1.57.8.1 08-Feb-2011  bouyer Sync with HEAD
 1.57.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.57.4.5 31-May-2011  rmind sync with head
 1.57.4.4 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.57.4.3 05-Mar-2011  rmind sync with head
 1.57.4.2 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.57.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.57.2.3 31-Oct-2010  uebayasi We already have a flag PMAP_NOCACHE. s/PMAP_UNMANAGED/PMAN_NOCACHE/.
Pointed out by Chuck Silvers, thanks.
 1.57.2.2 28-May-2010  uebayasi Comment.
 1.57.2.1 27-Apr-2010  uebayasi Always map device pages via cdev as unmanaged for now.

I need this to read/write a NOR FlashROM from userland. Otherwise pmaps
believe the physload'ed ROM region as managed, and map it as cache
enabled, which prevents me from reading ROM command status, etc.
 1.61.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.62.6.1 18-Feb-2012  mrg merge to -current.
 1.62.2.1 17-Apr-2012  yamt sync with head
 1.63.24.2 05-Feb-2017  skrll Sync with HEAD
 1.63.24.1 06-Apr-2015  skrll Sync with HEAD
 1.63.22.1 31-Dec-2014  snj Pull up following revision(s) (requested by chs in ticket #363):
common/lib/libprop/prop_kern.c: revision 1.18
sys/arch/mac68k/dev/grf_compat.c: revision 1.27
sys/arch/x68k/dev/grf.c: revision 1.45
sys/external/bsd/drm/dist/bsd-core/drm_bufs.c: revision 1.12
sys/external/bsd/drm2/drm/drm_drv.c: revision 1.12
sys/external/bsd/drm2/drm/drm_vm.c: revision 1.6
sys/external/bsd/drm2/include/linux/mm.h: revision 1.4
sys/kern/vfs_vnops.c: revision 1.192 via patch
sys/rump/librump/rumpkern/vm.c: revision 1.160
sys/sys/file.h: revision 1.78 via patch
sys/uvm/uvm_device.c: revision 1.64
sys/uvm/uvm_device.h: revision 1.13
sys/uvm/uvm_extern.h: revision 1.192
sys/uvm/uvm_mmap.c: revision 1.150 via patch
add a new "fo_mmap" fileops method to allow use of arbitrary uvm_objects for
mappings of file objects. move vnode-specific details of mmap()ing a vnode
from uvm_mmap() to the new vnode-specific vn_mmap(). add new uvm_mmap_dev()
and uvm_mmap_anon() convenience functions for mapping character devices
and anonymous memory, and replace all other calls to uvm_mmap() with those.
use the new fileop in drm2 so that libdrm can use mmap() to map things
like on other platforms (instead of the ioctl that we have used so far).
 1.63.6.1 03-Dec-2017  jdolecek update from HEAD
 1.64.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.64.2.1 20-Jul-2016  pgoyette Adapt machine-independant code to the new {b,c}devsw reference-counting
(using localcount(9)). All callers of {b,c}devsw_lookup() now call
{b,c}devsw_lookup_acquire() which retains a reference on the 'struct
{b,c}devsw'. This reference must be released by the caller once it is
finished with the structure's content (or other data that would disappear
if the 'struct {b,c}devsw' were to disappear).
 1.65.8.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.65.6.1 27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.66.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.67.2.1 29-Feb-2020  ad Sync with head.
 1.71.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.15 18-Dec-2021  riastradh Add some missing includes to uvm_device.h.

- sys/types.h for dev_t
- sys/queue.h for LIST_ENTRY
- uvm/uvm_object.h for complete struct uvm_object type
- uvm/uvm_param.h for voff_t/vsize_t
- uvm/uvm_prot.h for vm_prot_t
 1.14 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.13 14-Dec-2014  chs add a new "fo_mmap" fileops method to allow use of arbitrary uvm_objects for
mappings of file objects. move vnode-specific details of mmap()ing a vnode
from uvm_mmap() to the new vnode-specific vn_mmap(). add new uvm_mmap_dev()
and uvm_mmap_anon() convenience functions for mapping character devices
and anonymous memory, and replace all other calls to uvm_mmap() with those.
use the new fileop in drm2 so that libdrm can use mmap() to map things
like on other platforms (instead of the ioctl that we have used so far).
 1.12 02-Feb-2011  chuck branches: 1.12.14; 1.12.30; 1.12.32;
udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.11 11-Dec-2005  christos branches: 1.11.98; 1.11.104; 1.11.106;
merge ktrace-lwp.
 1.10 24-Mar-2004  junyoung Nuke __P().
 1.9 28-May-2000  drochner branches: 1.9.8; 1.9.26;
Don't silently truncate the voff_t offset to vaddr_t when passing it to
udv_attach. Pass the whole voff_t instead and do an explicite overflow
check before it is passed to the device's mmap handler (as "int", sadly).
 1.8 21-Jun-1999  thorpej branches: 1.8.2; 1.8.10;
Protect prototypes, certain macros, and inlines from userland.
 1.7 25-Mar-1999  mrg branches: 1.7.4;
remove now >1 year old pre-release message.
 1.6 24-Mar-1999  cgd modify udv_attach() and its caller (uvm_mmap()) so that it's passed the
offset and size of the requested region to be mapped, so that the
udv_attach() can use the device d_mmap() entry to check mappability
of the requested region.
 1.5 09-Mar-1998  mrg KNF.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.7.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.8.10.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.8.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.9.26.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.9.26.2 18-Sep-2004  skrll Sync with HEAD.
 1.9.26.1 03-Aug-2004  skrll Sync with HEAD
 1.9.8.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.11.106.1 08-Feb-2011  bouyer Sync with HEAD
 1.11.104.1 06-Jun-2011  jruoho Sync with HEAD.
 1.11.98.1 05-Mar-2011  rmind sync with head
 1.12.32.1 06-Apr-2015  skrll Sync with HEAD
 1.12.30.1 31-Dec-2014  snj Pull up following revision(s) (requested by chs in ticket #363):
common/lib/libprop/prop_kern.c: revision 1.18
sys/arch/mac68k/dev/grf_compat.c: revision 1.27
sys/arch/x68k/dev/grf.c: revision 1.45
sys/external/bsd/drm/dist/bsd-core/drm_bufs.c: revision 1.12
sys/external/bsd/drm2/drm/drm_drv.c: revision 1.12
sys/external/bsd/drm2/drm/drm_vm.c: revision 1.6
sys/external/bsd/drm2/include/linux/mm.h: revision 1.4
sys/kern/vfs_vnops.c: revision 1.192 via patch
sys/rump/librump/rumpkern/vm.c: revision 1.160
sys/sys/file.h: revision 1.78 via patch
sys/uvm/uvm_device.c: revision 1.64
sys/uvm/uvm_device.h: revision 1.13
sys/uvm/uvm_extern.h: revision 1.192
sys/uvm/uvm_mmap.c: revision 1.150 via patch
add a new "fo_mmap" fileops method to allow use of arbitrary uvm_objects for
mappings of file objects. move vnode-specific details of mmap()ing a vnode
from uvm_mmap() to the new vnode-specific vn_mmap(). add new uvm_mmap_dev()
and uvm_mmap_anon() convenience functions for mapping character devices
and anonymous memory, and replace all other calls to uvm_mmap() with those.
use the new fileop in drm2 so that libdrm can use mmap() to map things
like on other platforms (instead of the ioctl that we have used so far).
 1.12.14.1 03-Dec-2017  jdolecek update from HEAD
 1.14 19-May-2018  jdolecek Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.
 1.13 20-Apr-2018  jdolecek add prot parameter for uvm_emap_enter(), so that it's possible to
enter also read/write mappings
 1.12 02-Apr-2018  jdolecek fix typo in comment
 1.11 27-Nov-2014  uebayasi branches: 1.11.18;
Consistently use kpreempt_*() outside scheduler path.
 1.10 15-Sep-2013  martin Remove unused variable
 1.9 13-Apr-2012  yamt branches: 1.9.2; 1.9.4;
comments
 1.8 02-Sep-2011  dyoung branches: 1.8.2; 1.8.6;
Report vmem(9) errors out-of-band so that we can use vmem(9) to manage
ranges that include the least and the greatest vmem_addr_t. Update
vmem(9) uses throughout the kernel. Slightly expand on the tests in
subr_vmem.c, which still pass. I've been running a kernel with this
patch without any trouble.
 1.7 25-Apr-2010  ad Reduce memory spent on bookkeeping for large values of MAXCPUS.
 1.6 07-Nov-2009  cegger branches: 1.6.2; 1.6.4;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.5 29-Aug-2009  rmind - Re-enable direct I/O with emap for pipe.
- While not used, #ifdef KVA allocation in emap (so it wont burn the space).
 1.4 20-Jul-2009  kiyohara branches: 1.4.2;
Globalize uvm_emap_size. It use to calculate size of kernel page table.
http://mail-index.netbsd.org/current-users/2009/07/13/msg009983.html
 1.3 19-Jul-2009  rmind pmap_emap_sync: add an argument, and do not perform pmap_load() during
context switch (pmap_destroy() path seems to be unsafe), instead just
perform tlbflush(). Slightly inefficient, but good enough for now.
 1.2 09-Jul-2009  rmind branches: 1.2.2;
- Fix rare crashe in the intr_lapic_tlb_bcast() handler: save and setup
%fs on i386, %gs on amd64 registers, before using them. Otherwise, it
might be invalid/garbage, eg. IPI can interrupt userspace.

- Explicitly initialize per-CPU emap generation number.

Thanks <drochner> for reporting and testing of patch.
 1.1 28-Jun-2009  rmind Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
 1.2.2.6 11-Aug-2010  yamt sync with head.
 1.2.2.5 11-Mar-2010  yamt sync with head
 1.2.2.4 16-Sep-2009  yamt sync with head
 1.2.2.3 19-Aug-2009  yamt sync with head.
 1.2.2.2 18-Jul-2009  yamt sync with head.
 1.2.2.1 09-Jul-2009  yamt file uvm_emap.c was added on branch yamt-nfs-mp on 2009-07-18 14:53:28 +0000
 1.4.2.2 23-Jul-2009  jym Sync with HEAD.
 1.4.2.1 20-Jul-2009  jym file uvm_emap.c was added on branch jym-xensuspend on 2009-07-23 23:33:04 +0000
 1.6.4.1 30-May-2010  rmind sync with head
 1.6.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.8.6.1 29-Apr-2012  mrg sync to latest -current.
 1.8.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.2.1 17-Apr-2012  yamt sync with head
 1.9.4.1 18-May-2014  rmind sync with head
 1.9.2.2 03-Dec-2017  jdolecek update from HEAD
 1.9.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.11.18.3 21-May-2018  pgoyette Sync with HEAD
 1.11.18.2 22-Apr-2018  pgoyette Sync with HEAD
 1.11.18.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.235 14-Sep-2025  andvar Fix various typos in comments and log message.
 1.234 27-Apr-2025  riastradh posix_spawn(2): Allocate a new vmspace at process creation time.

This allocates a new vmspace for the process at the time the new
process is created, rather than sharing some other vmspace temporarily.
This eliminates any risk of anything bad happening due to temporary
sharing, since there isn't any sharing.

Resolves a race to where:

1. we set up the child to share proc0.p_vmspace at first,

2. another process tries to read the new child's psstrings via
kern.proc_args.<childpid>.argv or similar with the child's
p_reflock held and gets stuck in a uvm fault loop because
proc0.p_vmspace doesn't have the child's psstrings address
(inherited from the parent) mapped,

3. the child is waiting for p_reflock before it can replace its
p_vmspace or psstrings.

By allocating the vmspace up front, with no mappings in it, we avoid
exposing the child in this scenario. Minor possible downside is that
sysctl kern.proc_args.<childpid>.argv might spuriously fail with
EFAULT during this time (rather than fail with EBUSY as it does if
p_reflock is held concurrently) but that's not a particularly big
deal.

Patch and first paragraph of commit message written by chs@; minor
tweaks to comments -- and any mistakes in the analysis -- by me.

PR kern/59037: deadlock in posix_spawn
PR kern/59175: posix_spawn hang, hanging other process too
 1.233 26-Feb-2023  skrll branches: 1.233.6;
nkmempages should be size_t
 1.232 31-May-2021  riastradh branches: 1.232.12;
uvm: Make uvm_extern.h (more) self-contained, needs sys/types.h.
 1.231 14-Aug-2020  chs branches: 1.231.6; 1.231.8;
centralize calls from UVM to radixtree into a few functions.
in those functions, assert that the object lock is held in
the correct mode.
 1.230 14-Jun-2020  ad g/c vm_page_zero_enable
 1.229 13-Jun-2020  ad uvm_pagerealloc(): resurrect the insertion case.
 1.228 11-Jun-2020  ad uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
 1.227 26-May-2020  kamil Catch up with the usage of struct vmspace::vm_refcnt

Use the dedicated reference counting routines.

Change the type of struct vmspace::vm_refcnt and struct vm_map::ref_count
to volatile.

Remove the unnecessary vm->vm_map.misc_lock locking in process_domem().

Reviewed by <ad>
 1.226 09-May-2020  thorpej Make the uvm_voaddr structure more compact, only occupying 2 pointers
worth of space, by encoding the type in the lower bits of the object
pointer.
 1.225 27-Apr-2020  rin Add missing \ to fix build for PMAP_CACHE_VIVT, i.e., ARMv4 and prior.
 1.224 23-Apr-2020  ad PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)

- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
somewhere. Use it to decide whether to do direct-mapped copy, rather than
poking around directly in the vnode in ubc_uiomove(), which is ugly and
doesn't work for tmpfs. It would be nicer to contain all this in UVM but
the filesystem provides the needed locking here (VV_MAPPED) and to
reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS(). Pass in UBC_ISMAPPED where
appropriate.
 1.223 18-Apr-2020  thorpej Add an API to get a reference on the identity of an individual byte of
virtual memory, a "virtual object address". This is not a reference to
a physical byte of memory, per se, but a reference to a byte residing
in a page, owned by a unique UVM object (either a uobj or an anon). Two
separate address+addresses space tuples that reference the same byte in
an object (such as a location in a shared memory segment) will resolve
to equivalent virtual object addresses. Even if the residency status
of the page changes, the virtual object address remains unchanged.

struct uvm_voaddr -- a structure that encapsulates this address reference.

uvm_voaddr_acquire() -- a function to acquire this address reference,
given a vm_map and a vaddr_t.

uvm_voaddr_release() -- a function to release this address reference.

uvm_voaddr_compare() -- a function to compare two such address references.

uvm_voaddr_acquire() resolves the COW status of the object address before
acquiring.

In collaboration with riastradh@ and chs@.
 1.222 22-Mar-2020  ad branches: 1.222.2;
Process concurrent page faults on individual uvm_objects / vm_amaps in
parallel, where the relevant pages are already in-core. Proposed on
tech-kern.

Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until
adjustments are made to their pmaps.
 1.221 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.220 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.219 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.218 31-Dec-2019  ad branches: 1.218.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.217 31-Dec-2019  ad Rename uvm_free() -> uvm_availmem().
 1.216 27-Dec-2019  ad Redo the page allocator to perform better, especially on multi-core and
multi-socket systems. Proposed on tech-kern. While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
 1.215 21-Dec-2019  ad Add uvm_free(): returns number of free pages in system.
 1.214 16-Dec-2019  ad - Extend the per-CPU counters matt@ did to include all of the hot counters
in UVM, excluding uvmexp.free, which needs special treatment and will be
done with a separate commit. Cuts system time for a build by 20-25% on
a 48 CPU machine w/DIAGNOSTIC.

- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
 1.213 28-May-2018  chs branches: 1.213.2; 1.213.6;
allow tmpfs files to be larger than 4GB.
 1.212 19-May-2018  jdolecek Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.
 1.211 08-May-2018  christos don't store the rssmax in the lwp rusage, it is a per proc property. Instead
utilize an unused field in the vmspace struct to store it. Also conditionalize
on platforms that have pmap statistics available.
 1.210 20-Apr-2018  jdolecek add prot parameter for uvm_emap_enter(), so that it's possible to
enter also read/write mappings
 1.209 20-Apr-2018  jdolecek make ubc_alloc() and ubc_release() static, they should not be used
outside of ubc_uiomove()/ubc_zeropage(); for now mark as noinline
to keep them available as breakpoints
 1.208 15-Dec-2017  maya branches: 1.208.2;
Match locking notes with reality.
misc_lock is used to protect vm_refcnt.

ok chuq
 1.207 02-Dec-2017  mrg add two new members to uvmexp_sysctl{}: bootpages and poolpages.
bootpages is set to the pages allocated via uvm_pageboot_alloc().
poolpages is calculated from the list of pools nr_pages members.

this brings us closer to having a valid total of pages known by
the system, vs actual pages originally managed.

XXX: poolpages needs some handling for PR_RECURSIVE pools still.
 1.206 20-May-2017  chs MAP_FIXED means something different for mremap() than it does for mmap(),
so we cannot use UVM_FLAG_FIXED to specify both behaviors.
keep UVM_FLAG_FIXED with its earlier meaning (prior to my previous change)
of whether to use uvm_map_findspace() to locate space for the new mapping or
to use the hint address that the caller passed in, and add a new flag
UVM_FLAG_UNMAP to indicate that any existing entries in the range should be
unmapped as part of creating the new mapping. the new UVM_FLAG_UNMAP flag
may only be used if UVM_FLAG_FIXED is also specified.
 1.205 17-May-2017  christos snprintb(3) for UVM_FLAGS.
 1.204 06-May-2017  joerg Extend the mmap(2) interface to allow requesting protections for later
use with mprotect(2), but without enabling them immediately.

Extend the mremap(2) interface to allow duplicating mappings, i.e.
create a second range of virtual addresses references the same physical
pages. Duplicated mappings can have different effective protections.

Adjust PAX mprotect logic to disallow effective protections of W&X, but
allow one mapping W and another X protections. This obsoletes using
temporary files for purposes like JIT.

Adjust PAX logic for mmap(2) and mprotect(2) to fail if W&X is requested
and not silently drop the X protection.

Improve test cases to ensure correct operation of the changed
interfaces.
 1.203 04-Jan-2017  christos branches: 1.203.6;
don't include uvm_physseg.h for kmem grovellers.
 1.202 02-Jan-2017  cherry Remove a redundant #ifdef _KERNEL/#endif pair.

ok mrg@
 1.201 24-Dec-2016  cherry uvm_extern.h is has both a _KERNEL only, and a non _KERNEL only API.

Since we unconditionally expose the uvm_physseg.h API via uvm_extern.h
right now, and since uvm_physseg.h uses a kernel only datatype, viz
psize_t, we restrict exposure of uvm_physseg.h API exposure to kernel
only.

This is in conformance of its documentation via uvm_hotplug(9) as a
kernel internal API.
 1.200 22-Dec-2016  cherry Use uvm_physseg.h:uvm_page_physload() instead of uvm_extern.h

For this, include uvm_physseg.h in the build and include tree, make a
cosmetic modification to the prototype for uvm_page_physload().
 1.199 22-Dec-2016  cherry Add a new function called uvm_md_init() that can be called at the
appropriate time in the boot path by MD code.
 1.198 20-Jul-2016  maxv Introduce uvm_km_protect.
 1.197 25-May-2016  christos branches: 1.197.2;
Introduce security.pax.mprotect.ptrace sysctl which can be used to bypass
mprotect settings so that debuggers can write to the text segment of traced
processes so that they can insert breakpoints. Turned off by default.
Ok: chuq (for now)
 1.196 05-Feb-2016  christos PR/50744: NONAKA Kimihiro: Protect more stuff with _KERNEL && _KMEMUSER to
make uvm_extern.h compile standalone again for net-snmp.
 1.195 26-Nov-2015  martin We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.
 1.194 20-Mar-2015  riastradh Comments explaining UBC_* flags.
 1.193 06-Feb-2015  maxv Kill kmeminit().
 1.192 14-Dec-2014  chs add a new "fo_mmap" fileops method to allow use of arbitrary uvm_objects for
mappings of file objects. move vnode-specific details of mmap()ing a vnode
from uvm_mmap() to the new vnode-specific vn_mmap(). add new uvm_mmap_dev()
and uvm_mmap_anon() convenience functions for mapping character devices
and anonymous memory, and replace all other calls to uvm_mmap() with those.
use the new fileop in drm2 so that libdrm can use mmap() to map things
like on other platforms (instead of the ioctl that we have used so far).
 1.191 07-Jul-2014  riastradh branches: 1.191.2; 1.191.4;
Initialize ubchist earlier.
 1.190 22-May-2014  riastradh Add uao_set_pgfl to limit a uvm_aobj's pages to a specified freelist.

Brought up on tech-kern:

https://mail-index.netbsd.org/tech-kern/2014/05/20/msg017095.html
 1.189 21-Feb-2014  skrll branches: 1.189.2;
Remove unnecessary struct simplelock forward declaration.
 1.188 03-Jan-2014  dsl There is no need for uvm_coredump_walkmap() to explicity pass the proc_t
pointer to the calller's function.
If the code needs the process its address can be placed in the caller's
cookie.
 1.187 03-Jan-2014  dsl Minor changes to the process coredump code.
- Add some extra comments.
- Add some XXX comments because the process state might not be stable,
- Add uvm_coredump_count_segs() to simplify the calling code.
- uvm code now only returns non-empty sections/segments.
- Put the 'iocookie' into the 'cookie' block passed to uvm_coredump_walkmap()
instead of passing it through as an additional parameter.
amd64 can still generate core dumps that gdb can read.
 1.186 01-Jan-2014  dsl Change the type of the 'cookie' that holds the state of the core dump file
from 'void *' to the actual type 'struct coredump_iostate *'.
In most of the code the contents of the structure are still unknown.
This just stops the wrong type of pointer being passed to the 'void *'
parameter.
I hope I've found everything, amd64 GENERIC and i386 GENERIC & ALL compile.
 1.185 14-Nov-2013  martin As discussed on tech-kern: make TOPDOWN-VM runtime selectable per process
(offer MD code or emulations to override it).
 1.184 01-Sep-2012  matt branches: 1.184.2; 1.184.4;
Add a __HAVE_CPU_UAREA_IDLELWP hook so that the MD code can allocate
special UAREAs for idle lwp's.
 1.183 08-Apr-2012  martin Rework posix_spawn locking and memory management:
- always provide a vmspace for the new proc, initially borrowing from proc0
(this part fixes PR 46286)
- increase parallelism between parent and child if arguments allow this,
avoiding a potential deadlock on exec_lock
- add a new flag for userland to request old (lockstepped) behaviour for
better error reporting
- adapt test cases to the previous two and add a new variant to test the
diagnostics flag
- fix a few memory (and lock) leaks
- provide netbsd32 compat
 1.182 18-Mar-2012  uebayasi Move base type definitions from uvm_extern.h to uvm_param.h so that
other sources can easily include part of UVM headers without the whole
uvm_extern.h (e.g. sys/vnode.h wants only uvm_object.h).
 1.181 02-Feb-2012  para branches: 1.181.2;
- bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)
 1.180 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.179 05-Jan-2012  reinoud Revert MAP_NOSYSCALLS patch.
 1.178 22-Dec-2011  reinoud Redo uvm_map_setattr() to never fail and remove the possible panic. The
possibility of failure was a C&P error.
 1.177 20-Dec-2011  reinoud Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..
 1.176 01-Sep-2011  matt branches: 1.176.2; 1.176.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.
 1.175 27-Aug-2011  christos Add an optional pglist argument to uvm_obj_wirepages, to be
filled with the list of pages that were wired.
 1.174 16-Jun-2011  hannken Rename uvm_vnp_zerorange(struct vnode *, off_t, size_t) to
ubc_zerorange(struct uvm_object *, off_t, size_t, int) changing
the first argument to an uvm_object and adding a flags argument.

Modify tmpfs_reg_resize() to zero the backing store (aobj) instead
of the vnode. Ubc_purge() no longer panics when unmounting tmpfs.

Keep uvm_vnp_zerorange() until the next kernel version bump.
 1.173 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.172 23-Apr-2011  rmind branches: 1.172.2;
Replace "malloc" in comments, remove unnecessary header inclusions.
 1.171 17-Feb-2011  matt Add support for cpu-specific uarea allocation routines. Allows different
allocation for user and system lwps. MIPS will use this to map uareas of
system lwp used direct-mapped addresses (to reduce the overhead of
switching to kernel threads). ibm4xx could use to map uareas via direct
mapped addresses and avoid the problem of having the kernel stack not in
the TLB.
 1.170 10-Feb-2011  pooka Make vmapbuf() return success/error and make physio deal with a
failure.
 1.169 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.168 04-Jan-2011  matt branches: 1.168.2; 1.168.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.
 1.167 20-Dec-2010  matt Move counting of faults, traps, intrs, soft[intr]s, syscalls, and nswtch
from uvmexp to per-cpu cpu_data and move them to 64bits. Remove unneeded
includes of <uvm/uvm_extern.h> and/or <uvm/uvm.h>.
 1.166 13-Nov-2010  uebayasi Hide uvm/uvm_page.h again to ensure its internal structures are MD.

GENERIC or at least one kernel compile tested for:
acorn26, acorn32, algor, all, alpha, amd64, amiga, amigappc,
arc, bebox, bighill, cats, cobalt, dreamcast, ews4800mips,
hp300, hp700, hpcarm, hpcmips, hpcsh, i386, ibmnws,
integrator, ixm1200, iyonix, landisk, luna68k, mac68k,
macppc, mipsco, mmeye, mvme68k, mvmeppc, netwinder, news68k,
newsmips, next68k, obs266a, ofppc, pmax, pmppc, prep,
rs6000, sandpoint, sbmips, shark, sidebeach, sparc, sparc64,
sun2, sun3, usermode, vax, x68k, zaurus
 1.165 12-Nov-2010  uebayasi Put back uvm_page.h for now. Sorry for mess.
 1.164 12-Nov-2010  uebayasi Abstraction fix; don't pull in physical segment/page definitions
in UVM external API, uvm_extern.h. Because most users care only
virtual memory.

Device drivers use bus_dma(9) to manage physical memory. Device
drivers pull in bus_dma(9) API, bus_dma.h. bus_dma(9) implementations
pull in UVM internal API, uvm.h.

Tested By: Compiling i386 ALL kernel
 1.163 16-Apr-2010  rmind - Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.
 1.162 08-Feb-2010  joerg branches: 1.162.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.
 1.161 21-Nov-2009  rmind branches: 1.161.2;
Add uvm_lwp_getuarea() and uvm_lwp_setuarea(). OK matt@.
 1.160 21-Oct-2009  rmind Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.159 18-Aug-2009  yamt whitespace fixes. no functional changes.
 1.158 10-Aug-2009  haad Add uvm_reclaim_hooks support for reclaiming kernel KVA space and memory.
This is used only by zfs where uvm_reclaim hook is added from arc cache.

Oked ad@.
 1.157 05-Aug-2009  pooka kill uvm_aio_biodone1(). only user was lfs and that uses nestiobuf now.
 1.156 05-Aug-2009  pooka add some advice symbols we'll eventually need
 1.155 28-Jun-2009  rmind Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
 1.154 30-Mar-2009  yamt g/c uvm_aiobuf_pool.
 1.153 29-Mar-2009  mrg - add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
 1.152 12-Mar-2009  abs Clarify free_list usage in uvm_page_physload() regarding faster/slower RAM.
Slower RAM should be assigned a higher free_list id.
No functional change to code, just comments and manpage
 1.151 18-Feb-2009  yamt make some functions static.
 1.150 26-Nov-2008  pooka branches: 1.150.4;
Rototill all remaining file systems to use ubc_uiomove() instead
of the ubc_alloc() - uiomove() - ubc_release() dance.
 1.149 31-Oct-2008  christos - allocate 8 pointers on the stack to avoid stack overflow in nfs.
- make that 8 a constant
- remove bogus panic
 1.148 08-Aug-2008  skrll branches: 1.148.2; 1.148.4;
g/c exec_map
 1.147 11-Jul-2008  skrll English improvement in comments.

"seems good to me :)" from yamt.
 1.146 04-Jun-2008  ad branches: 1.146.2; 1.146.4;
- vm_page: put listq, pageq into a union alongside a LIST_ENTRY, so we can
use both types of list.

- Make page coloring and idle zero state per-CPU.

- Maintain per-CPU page freelists. When freeing, put pages onto the local
CPU's lists and the global lists. When allocating, prefer to take pages
from the local CPU. If none are available take from the global list as
done now. Proposed on tech-kern@.
 1.145 29-Feb-2008  yamt branches: 1.145.2; 1.145.4; 1.145.6;
uvm_swap_io: if pagedaemon, don't wait for iobuf.
 1.144 28-Jan-2008  yamt branches: 1.144.2; 1.144.6;
remove a special allocator for uareas, which is no longer necessary.
use pool_cache instead.
 1.143 02-Jan-2008  ad Merge vmlocking2 to head.
 1.142 26-Dec-2007  christos Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.
 1.141 24-Dec-2007  perry Remove __attribute__((__noreturn__)) from things already marked __dead
Found by the department of redundancy department.
 1.140 13-Dec-2007  yamt add ddb "whatis" command. inspired from solaris ::whatis dcmd.
 1.139 05-Dec-2007  yamt branches: 1.139.2; 1.139.4;
g/c uvm_vnp_sync
 1.138 05-Dec-2007  yamt fix UBC_WANT_UNMAP.
- check PMAP_CACHE_VIVT after pulling pmap.h.
- VTEXT -> VI_TEXT.
 1.137 30-Nov-2007  ad branches: 1.137.2;
Make {anon,file,exec}pages unsigned.
 1.136 06-Nov-2007  ad Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.
 1.135 18-Aug-2007  ad branches: 1.135.2; 1.135.6; 1.135.8;
Make the uarea cache per-CPU and drain in batches of 4.
 1.134 27-Jul-2007  yamt branches: 1.134.4; 1.134.6;
ubc_uiomove: add an "advice" argument rather than using UVM_ADV_RANDOM blindly.
 1.133 22-Jul-2007  pooka Retire uvn_attach() - it abuses VXLOCK and its functionality,
setting vnode sizes, is handled elsewhere: file system vnode creation
or spec_open() for regular files or block special files, respectively.

Add a call to VOP_MMAP() to the pagedvn exec path, since the vnode
is being memory mapped.

reviewed by tech-kern & wrstuden
 1.132 17-Jul-2007  joerg branches: 1.132.2;
Add native mremap system call based on the UVM implementation for
Linux compat. Add code to enforce alignment of the new location.
Special thanks to wizd for helping with the man page.
 1.131 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.130 05-Jun-2007  yamt improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.129 24-Mar-2007  rmind Export uvm_uarea_free() to the rest.
Make things compile again.
 1.128 04-Mar-2007  christos branches: 1.128.2; 1.128.4; 1.128.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.127 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.126 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.125 15-Feb-2007  ad branches: 1.125.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).
 1.124 21-Dec-2006  yamt merge yamt-splraiseipl branch.

- finish implementing splraiseipl (and makeiplcookie).
http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html
- complete workqueue(9) and fix its ipl problem, which is reported
to cause audio skipping.
- fix netbt (at least compilation problems) for some ports.
- fix PR/33218.
 1.123 07-Dec-2006  elad Back out uvm_is_swap_device().
 1.122 01-Dec-2006  elad branches: 1.122.2;
Introduce uvm_is_swap_device(), to check if the passed struct vnode * is
used as a swap device or not.

Okay mrg@.
 1.121 12-Oct-2006  yamt move some knowledge about vnode into uvm_vnode.c.
 1.120 12-Oct-2006  yamt uobj_wirepages and uobj_unwirepages from Mindaugas. PR/34771.
(commented out in files.uvm for now because there is no user in tree.)

http://mail-index.netbsd.org/tech-kern/2006/09/24/0000.html
http://mail-index.netbsd.org/tech-kern/2006/10/10/0000.html
 1.119 05-Oct-2006  chs add support for O_DIRECT (I/O directly to application memory,
bypassing any kernel caching for file data).
 1.118 15-Sep-2006  yamt branches: 1.118.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.117 01-Sep-2006  cherry branches: 1.117.2;
bumps kernel aobj to 64 bit. \
See: http://mail-index.netbsd.org/tech-kern/2006/03/07/0007.html
 1.116 04-Aug-2006  he Rearrange included headers and/or add include of <sys/types.h> and
<sys/lock.h>, so that the mipsco port can build again, ref.
http://mail-index.netbsd.org/port-mips/2006/08/04/0000.html
Reviewed by thorpej
 1.115 05-Jul-2006  drochner Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.
 1.114 19-May-2006  yamt branches: 1.114.2; 1.114.4;
UVM_MAPFLAG: add missing parens.
 1.113 14-May-2006  elad integrate kauth.
 1.112 15-Mar-2006  drochner branches: 1.112.2;
-clean up the interface to uvm_fault: the "fault type" didn't serve
any purpose (done by a macro, so we don't save any cycles for now)
-kill vm_fault_t; it is not needed for real faults, and for simulated
faults (wiring) it can be replaced by UVM internal flags
-remove <uvm/uvm_fault.h> from uvm_extern.h again
 1.111 01-Mar-2006  yamt branches: 1.111.2; 1.111.4;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.110 10-Feb-2006  simonb Make a note that some counters should be 64-bit as they wrap far to
quickly.
 1.109 21-Jan-2006  yamt branches: 1.109.2; 1.109.4;
implement compat_linux mremap.
 1.108 21-Dec-2005  yamt branches: 1.108.2;
make length of inactive queue tunable by sysctl. (vm.inactivepct)
 1.107 29-Nov-2005  yamt merge yamt-readahead branch.
 1.106 01-Sep-2005  yamt branches: 1.106.6;
remove one of duplicated forward decl. of vmspace. pointed by Dheeraj S.
 1.105 01-Sep-2005  yamt put back uvm_fault.h for now as it's needed for some ports.
 1.104 27-Aug-2005  yamt don't include uvm_fault.h unnecessarily.
 1.103 10-Jun-2005  matt branches: 1.103.2;
Rework the coredump code to have no explicit knownledge of how coredump
i/o is done. Instead, pass an opaque cookie which is then passed to a
new routine, coredump_write, which does the actual i/o. This allows the
method of doing i/o to change without affecting any future MD code.
Also, make netbsd32_core.c [re]use core_netbsd.c (in a similar manner that
core_elf64.c uses core_elf32.c) and eliminate that code duplication.
cpu_coredump{,32} is now called twice, first with a NULL iocookie to fill
the core structure and a second to actually write md parts of the coredump.
All i/o is nolonger random access and is suitable for shipping over a stream.
 1.102 02-Jun-2005  matt When writing coredumps, don't write zero uninstantiated demand-zero pages.
Also, with ELF core dumps, trim trailing zeroes from sections. These two
changes can shrink coredumps by over 50% in size.
 1.101 15-May-2005  yamt remove anon related statistics which are no longer used.
 1.100 01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.99 26-Mar-2005  fvdl Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.
 1.98 13-Jan-2005  yamt branches: 1.98.2; 1.98.4; 1.98.8;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.
 1.97 09-Jan-2005  chs adjust the UBC mapping code to support non-vnode uvm_objects.
this means we can no longer look at the vnode size to determine how many
pages to request in a fault, which is good since for NFS the size can change
out from under us on the server anyway. there's also a new flag UBC_UNMAP
for ubc_release(), so that the file system code can make the decision about
whether to cache mappings for files being used as executables.
 1.96 01-Jan-2005  yamt in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.
 1.95 01-Jan-2005  yamt introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.
 1.94 01-Jan-2005  yamt for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.
 1.93 28-Aug-2004  thorpej Garbage-collect pagemove(); nothing use it anymore (YAY!!!)
 1.92 04-May-2004  pk Since a `vmspace' always includes a `vm_map' we can re-use vm_map's
reference count lock to also protect the vmspace's reference count.
 1.91 24-Mar-2004  junyoung Nuke __P().
 1.90 14-Mar-2004  jdolecek fix typo in comment
 1.89 13-Feb-2004  yamt when breaking a loan from uobj,
insert the replacement page into the same position
as the original page on the object memq so that
genfs_putpages (and lfs) won't be confused.

noted by Stephan Uphoff (PR/24328)
 1.88 04-Jan-2004  jdolecek Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread
 1.87 18-Dec-2003  pk * Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.
 1.86 18-Dec-2003  pk Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.
 1.85 13-Nov-2003  chs eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.
 1.84 11-Aug-2003  pk Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.
 1.83 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.82 29-Jun-2003  fvdl branches: 1.82.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.81 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.80 10-May-2003  thorpej Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.
 1.79 08-May-2003  thorpej Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).
 1.78 03-May-2003  wiz Misc fixes from jmc@openbsd.
 1.77 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.76 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.75 11-Dec-2002  thorpej Define a UVM_FLAG_NOWAIT, which indicates that we're not allowed
to sleep. Define UVM_KMF_NOWAIT in terms of UVM_FLAG_NOWAIT.

From Manuel Bouyer. Fixes a problem where any mapping with
read protection was created in a "nowait" context, causing
spurious failures.
 1.74 17-Nov-2002  chs change uvm_uarea_alloc() to indicate whether the returned uarea is already
backed by physical pages (ie. because it reused a previously-freed one),
so that we can skip a bunch of useless work in that case.
this fixes the underlying problem behind PR 18543, and also speeds up fork()
quite a bit (eg. 7% on my pc, 1% on my ultra2) when we get a cache hit.
 1.73 22-Sep-2002  chs encapsulate knowledge of uarea allocation in some new functions.
 1.72 15-Sep-2002  chs add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.
 1.71 17-May-2002  enami branches: 1.71.2;
Make uvn_findpages to return number of pages found so that caller can
easily check if all requested pages are found or not.
 1.70 10-Dec-2001  thorpej branches: 1.70.8;
Move the code that walks the process's VM map during a coredump
into uvm_coredump_walkmap(), and use callbacks into the coredump
routine to do something with each section.
 1.69 09-Dec-2001  chs add {anon,file,exec}max as a upper bound on the amount of memory that
will be allocated for the respective usage types when there is contention
for memory.

replace "vnode" and "vtext" with "file" and "exec" in uvmexp field names
and sysctl names.
 1.68 08-Dec-2001  thorpej Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).
 1.67 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.66 16-Aug-2001  chs branches: 1.66.2;
user maps are always pageable.
 1.65 02-Jun-2001  chs branches: 1.65.2;
replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.64 26-May-2001  chs replace vm_page_t with struct vm_page *.
 1.63 25-May-2001  chs remove trailing whitespace.
 1.62 02-May-2001  thorpej Support dynamic sizing of the page color bins. We also support
dynamically re-coloring pages; as machine-dependent code discovers
the size of the system's caches, it may call uvm_page_recolor() with
the new number of colors to use. If the new mumber of colors is
smaller (or equal to) the current number of colors, then uvm_page_recolor()
is a no-op.

The system defaults to one bucket if machine-dependent code does not
initialize uvmexp.ncolors before uvm_page_init() is called.

Note that the number of color bins should be initialized to something
reasonable as early as possible -- for many early memory allocations,
we live with the consequences of the page choice for the lifetime of
the boot.
 1.61 01-May-2001  thorpej Add the number of page colors to uvmexp.
 1.60 29-Apr-2001  thorpej Implement page coloring, using a round-robin bucket selection
algorithm (Solaris calls this "Bin Hopping").

This implementation currently relies on MD code to define a
constant defining the number of buckets. This will change
reasonably soon (MD code will be able to dynamically size
the bucket array).
 1.59 25-Apr-2001  thorpej pmap_resident_count() always exists. Besides, returning the
value of vm_rssize is pointless -- it is never initialized to
anything other than 0.
 1.58 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.57 09-Mar-2001  chs add UBC memory-usage balancing. we track the number of pages in use for
each of the basic types (anonymous data, executable image, cached files)
and prevent the pagedaemon from reusing a given page if that would reduce
the count of that type of page below a sysctl-setable minimum threshold.
the thresholds are controlled via three new sysctl tunables:
vm.anonmin, vm.vnodemin, and vm.vtextmin. these tunables are the
percentages of pageable memory reserved for each usage, and we do not allow
the sum of the minimums to be more than 95% so that there's always some
memory that can be reused.
 1.56 06-Feb-2001  eeh branches: 1.56.2;
Specify a process' address space limits for uvmspace_exec().
 1.55 30-Nov-2000  simonb Move uvm_pgcnt_vnode and uvm_pgcnt_anon into uvmexp (as vnodepages and
anonpages), and add vtextpages which is currently unused but will be
used to trace the number of pages used by vtext vnodes.
 1.54 29-Nov-2000  simonb Add a vm.uvmexp2 sysctl that uses a ABI-safe 'struct uvmexp_sysctl'.
 1.53 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.52 27-Nov-2000  nisimura Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.
 1.51 28-Sep-2000  eeh Add support for variable end of user stacks needed to support COMPAT_NETBSD32:

`struct vmspace' has a new field `vm_minsaddr' which is the user TOS.

PS_STRINGS is deprecated in favor of curproc->p_pstr which is derived
from `vm_minsaddr'.

Bump the kernel version number.
 1.50 21-Sep-2000  thorpej Make PMAP_PAGEIDLEZERO() return a boolean value. FALSE indidcates
that the page being zero'd was not completed and that page zeroing
should be aborted. This may be used by machine-dependent code doing
slow page access to reduce the latency of running a process that has
become runnable while in the middle of doing a slow page zero.
 1.49 13-Sep-2000  thorpej Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.
 1.48 12-Aug-2000  thorpej Don't bother with a trampoline to start the pagedaemon and
reaper threads.
 1.47 01-Aug-2000  wiz Rename VM_INHERIT_* to MAP_INHERIT_* and move them to sys/sys/mman.h as
discussed on tech-kern.
Retire sys/uvm/uvm_inherit.h, update man page for minherit(2).
 1.46 24-Jul-2000  jeffs Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.
 1.45 27-Jun-2000  mrg move the contents of <vm/vm.h> into <uvm/uvm_extern.h>. <vm/vm.h> is simply
an include of <uvm/uvm_extern.h> now.
 1.44 27-Jun-2000  mrg more vm header file changes:

<vm/vm_extern.h> merged into <uvm/uvm_extern.h>
<vm/vm_page.h> merged into <uvm/uvm_page.h>
<vm/pmap.h> has become <uvm/uvm_pmap.h>

this leaves just <vm/vm.h> in NetBSD.
 1.43 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.42 08-Jun-2000  thorpej Change UVM_UNLOCK_AND_WAIT() to use ltsleep() (it is now atomic, as
advertised). Garbage-collect uvm_sleep().
 1.41 28-May-2000  thorpej Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.
 1.40 24-Apr-2000  thorpej branches: 1.40.2;
Changes necessary to implement pre-zero'ing of pages in the idle loop:
- Make page free lists have two actual queues: known-zero pages and
pages with unknown contents.
- Implement uvm_pageidlezero(). This function attempts to zero up to
the target number of pages until the target has been reached (currently
target is `all free pages') or until whichqs becomes non-zero (indicating
that a process is ready to run).
- Define a new hook for the pmap module for pre-zero'ing pages. This is
used to zero the pages using uncached access. This allows us to zero
as many pages as we want without polluting the cache.

In order to use this feature, each platform must add the appropropriate
glue in their idle loop.
 1.39 10-Apr-2000  thorpej Add UVM_PGA_ZERO which instructs uvm_pagealloc{,_strat}() to return a
zero'd, ! PG_CLEAN page, as if it were uvm_pagezero()'d.
 1.38 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.37 11-Feb-2000  thorpej Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.
 1.36 11-Jan-2000  chs add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.35 30-Dec-1999  eeh I should have made uvm_page_physload() take paddr_t's instead of vaddr_t's.
Also, add uvm_coredump32().
 1.34 22-Jul-1999  thorpej branches: 1.34.2;
Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.33 17-Jul-1999  thorpej Add a set of "lockflags", which can control the locking behavior
of some functions. Use these flags in uvm_map_pageable() to determine
if the map is locked on entry (replaces an already present boolean_t
argument `islocked'), and if the function should return with the map
still locked.
 1.32 02-Jul-1999  thorpej Bring in additional uvmexp members from chs-ubc2, so that VM stats can
be read no matter which kernel you're running.
 1.31 21-Jun-1999  thorpej Protect prototypes, certain macros, and inlines from userland.
 1.30 18-Jun-1999  thorpej Add the guts of mlockall(MCL_FUTURE). This requires that a process's
"memlock" resource limit to uvm_mmap(). Update all calls accordingly.
 1.29 17-Jun-1999  thorpej Make uvm_vslock() return the error code from uvm_fault_wire(). All places
which use uvm_vslock() should now test the return value. If it's not
KERN_SUCCESS, wiring the pages failed, so the operation which is using
uvm_vslock() should error out.

XXX We currently just EFAULT a failed uvm_vslock(). We may want to do
more about translating error codes in the future.
 1.28 15-Jun-1999  thorpej Several changes, developed and tested concurrently:
* Provide POSIX 1003.1b mlockall(2) and munlockall(2) system calls.
MCL_CURRENT is presently implemented. MCL_FUTURE is not fully
implemented. Also, the same one-unlock-for-every-lock caveat
currently applies here as it does to mlock(2). This will be
addressed in a future commit.
* Provide the mincore(2) system call, with the same semantics as
Solaris.
* Clean up the error recovery in uvm_map_pageable().
* Fix a bug where a process would hang if attempting to mlock a
zero-fill region where none of the pages in that region are resident.
[ This fix has been submitted for inclusion in 1.4.1 ]
 1.27 26-May-1999  thorpej Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.
 1.26 26-May-1999  thorpej Pass an access_type to uvm_vslock().
 1.25 13-May-1999  thorpej Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).
 1.24 11-Apr-1999  chs add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.23 26-Mar-1999  chs branches: 1.23.2;
add uvmexp.swpgonly and use it to detect out-of-swap conditions.
 1.22 25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.21 08-Sep-1998  thorpej branches: 1.21.2;
Implement uvm_exit(), which frees VM resources when a process finishes
exiting.
 1.20 28-Aug-1998  thorpej Add a waitok boolean argument to the VM system's pool page allocator backend.
 1.19 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.18 01-Aug-1998  thorpej We need to be able to specify a uvm_object to the pool page allocator, too.
 1.17 31-Jul-1998  thorpej Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.
 1.16 24-Jul-1998  thorpej branches: 1.16.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.
 1.15 08-Jul-1998  thorpej Add support for multiple memory free lists. There is at least one
default free list, and 0 - N additional free list, in order of descending
priority.

A new page allocation function, uvm_pagealloc_strat(), has been added,
providing three page allocation strategies:

- normal: high -> low priority free list walk, taking the
page off the first free list that has one.

- only: attempt to allocate a page only from the specified free
list, failing if that free list has none available.

- fallback: if `only' fails, fall back on `normal'.

uvm_pagealloc(...) is provided for normal use (and is a synonym for
uvm_pagealloc_strat(..., UVM_PGA_STRAT_NORMAL, 0); the free list argument
is ignored for the `normal' case).

uvm_page_physload() now specified which free list the pages will be
loaded onto. This means that some platforms which have multiple physical
memory segments may define additional vm_physsegs if they wish to break
individual physical segments into differing priorities.

Machine-dependent code must define _at least_ the following constants
in <machine/vmparam.h>:

VM_NFREELIST: the number of free lists the system will have

VM_FREELIST_DEFAULT: the default freelist (should always be 0,
but is defined in machdep code so that it's with all of the
other free list-related constants).

Additional free list names may be defined by machine-dependent code, but
they will only be used by machine-dependent code (e.g. for loading the
vm_physsegs).
 1.14 04-Jul-1998  jonathan defopt DDB.
 1.13 09-May-1998  kleink Use size_t to pass the length of the memory region to operate on to chgkprot(),
kernacc(), useracc(), vslock() and vsunlock(); (unsigned) ints are not
adequate on all platforms.
 1.12 30-Apr-1998  thorpej Pass vslock() and vsunlock() a proc *, rather than implicitly operating
on curproc.
 1.11 30-Mar-1998  mycroft Mark scheduler() and uvm_scheduler() as never returning.
 1.10 27-Mar-1998  thorpej Split uvmspace_alloc() into uvmspace_alloc() and uvmspace_init(). The latter
can be used for initializing a pre-allocated vmspace.
 1.9 09-Mar-1998  mrg KNF.
 1.8 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.7 09-Feb-1998  mrg keep statistics on pageout/pagein, total pages, and total operations.
 1.6 08-Feb-1998  thorpej Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.
 1.5 07-Feb-1998  mrg implement counters for pages paged in/out
 1.4 07-Feb-1998  mrg restore rcsids
 1.3 07-Feb-1998  chs prototype for uvm_map_checkprot() moved here.
add uvmexp fields for pagouts-in-progress and kernel-reserved pages.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.16.2.2 08-Aug-1998  eeh Revert cdevsw mmap routines to return int.
 1.16.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.21.2.6 02-Jun-1999  chs add a new uvn_findpages() flag, UFP_NORDONLY,
which means that PG_RDONLY pages should not be returned.
 1.21.2.5 30-May-1999  chs uvm_vnp_setpageblknos() is out, uvm_vnp_asyncget() is in.
 1.21.2.4 30-Apr-1999  chs change ubc_alloc()'s length arg to be a pointer instead of the value.
the pointed-to value is the total desired length on input,
and is updated to the length that will fit in the returned window.
this allows callers of ubc_alloc() to be ignorant of the window size.
 1.21.2.3 09-Apr-1999  chs add decl for aiodone daemon.
 1.21.2.2 25-Feb-1999  chs define UFP_* (uvn_findpages() flags).
add uvm_aiobuf pool stuff.
add new prototypes.
 1.21.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.23.2.1 16-Apr-1999  chs branches: 1.23.2.1.2;
pull up 1.23 -> 1.24:
add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.23.2.1.2.7 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.23.2.1.2.6 02-Aug-1999  thorpej Update from trunk.
 1.23.2.1.2.5 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.23.2.1.2.4 04-Jul-1999  chs adjust protos.
 1.23.2.1.2.3 01-Jul-1999  thorpej Sync w/ -current.
 1.23.2.1.2.2 21-Jun-1999  thorpej Sync w/ -current.
 1.23.2.1.2.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.34.2.5 27-Mar-2001  bouyer Sync with HEAD.
 1.34.2.4 12-Mar-2001  bouyer Sync with HEAD.
 1.34.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.34.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.34.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.40.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.56.2.11 11-Dec-2002  thorpej Sync with HEAD.
 1.56.2.10 11-Dec-2002  thorpej Sync with HEAD.
 1.56.2.9 18-Oct-2002  nathanw Catch up to -current.
 1.56.2.8 17-Sep-2002  nathanw Catch up to -current.
 1.56.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.56.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.56.2.5 21-Sep-2001  nathanw Catch up to -current.
 1.56.2.4 24-Aug-2001  nathanw Catch up with -current.
 1.56.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.56.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.56.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.65.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.65.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.65.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.65.2.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.66.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.70.8.1 30-May-2002  gehenna Catch up with -current.
 1.71.2.1 02-Jun-2003  tron Pull up revision 1.72 (requested by skrll):
add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.
 1.82.2.10 11-Dec-2005  christos Sync with head.
 1.82.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.82.2.8 01-Apr-2005  skrll Sync with HEAD.
 1.82.2.7 17-Jan-2005  skrll Sync with HEAD.
 1.82.2.6 31-Oct-2004  skrll Reduce diff to HEAD.
 1.82.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.82.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.82.2.3 03-Sep-2004  skrll Sync with HEAD
 1.82.2.2 03-Aug-2004  skrll Sync with HEAD
 1.82.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.98.8.1 18-Sep-2005  tron Pull up following revision(s) (requested by fvdl in ticket #798):
sys/compat/sunos/sunos_exec.c: revision 1.47
sys/compat/pecoff/pecoff_emul.c: revision 1.11
sys/arch/sparc64/sparc64/netbsd32_machdep.c: revision 1.45
sys/arch/amd64/amd64/netbsd32_machdep.c: revision 1.12
sys/sys/proc.h: revision 1.198
sys/compat/mach/mach_exec.c: revision 1.56
sys/compat/freebsd/freebsd_exec.c: revision 1.27
sys/arch/sparc64/include/vmparam.h: revision 1.27
sys/kern/kern_resource.c: revision 1.91
sys/compat/netbsd32/netbsd32_netbsd.c: revision 1.88
sys/compat/osf1/osf1_exec.c: revision 1.39
sys/compat/svr4_32/svr4_32_resource.c: revision 1.5
sys/compat/ultrix/ultrix_misc.c: revision 1.99
sys/compat/svr4_32/svr4_32_exec.h: revision 1.9
sys/kern/exec_elf32.c: revision 1.103
sys/compat/aoutm68k/aoutm68k_exec.c: revision 1.19
sys/compat/sunos32/sunos32_exec.c: revision 1.20
sys/compat/hpux/hpux_exec.c: revision 1.46
sys/compat/darwin/darwin_exec.c: revision 1.40
sys/kern/sysv_shm.c: revision 1.83
sys/uvm/uvm_extern.h: revision 1.99
sys/uvm/uvm_mmap.c: revision 1.89
sys/kern/kern_exec.c: revision 1.195
sys/compat/netbsd32/netbsd32.h: revision 1.31
sys/arch/sparc64/sparc64/svr4_32_machdep.c: revision 1.20
sys/compat/svr4/svr4_exec.c: revision 1.56
sys/compat/irix/irix_exec.c: revision 1.41
sys/compat/ibcs2/ibcs2_exec.c: revision 1.63
sys/compat/svr4_32/svr4_32_exec.c: revision 1.16
sys/arch/amd64/include/vmparam.h: revision 1.8
sys/compat/linux/common/linux_exec.c: revision 1.73
Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.
* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2
Tested on amd64, compile-tested on sparc64.
 1.98.4.3 26-Mar-2005  yamt sync with head.
 1.98.4.2 25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.98.4.1 25-Jan-2005  yamt remove some compatibility functions.
 1.98.2.1 29-Apr-2005  kent sync with -current
 1.103.2.9 17-Mar-2008  yamt sync with head.
 1.103.2.8 04-Feb-2008  yamt sync with head.
 1.103.2.7 21-Jan-2008  yamt sync with head
 1.103.2.6 07-Dec-2007  yamt sync with head
 1.103.2.5 15-Nov-2007  yamt sync with head.
 1.103.2.4 03-Sep-2007  yamt sync with head.
 1.103.2.3 26-Feb-2007  yamt sync with head.
 1.103.2.2 30-Dec-2006  yamt sync with head.
 1.103.2.1 21-Jun-2006  yamt sync with head.
 1.106.6.2 19-Nov-2005  yamt - as read-ahead context is per-vnode now,
there are less reasons to make VOP_READ call uvm_ra_request explicitly.
move it to pager (uvn_get) so that it can handle accesses via mmap as well.
- pass advice to pager via ubc.
- tweak DPRINTF.

XXX can be disturbed by PGO_LOCKED.

XXX it's controversial where it should be done.
(uvm_fault, uvn_get or genfs_getpages.)
 1.106.6.1 17-Nov-2005  yamt comment.
 1.108.2.4 18-Feb-2006  yamt sync with head.
 1.108.2.3 01-Feb-2006  yamt sync with head.
 1.108.2.2 15-Jan-2006  yamt rename VMSPACE_IS_KERNEL to VMSPACE_IS_KERNEL_P. ("predicate")
suggested by Matt Thomas.
 1.108.2.1 31-Dec-2005  yamt - add a function to add a reference to a vmspace.
- add a macro to check if a vmspace belongs to kernel.
 1.109.4.2 01-Jun-2006  kardel Sync with head.
 1.109.4.1 22-Apr-2006  simonb Sync with head.
 1.109.2.1 09-Sep-2006  rpaulo sync with head
 1.111.4.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.111.4.1 19-Apr-2006  elad oops - *really* sync to head this time.
 1.111.2.5 03-Sep-2006  yamt sync with head.
 1.111.2.4 11-Aug-2006  yamt sync with head
 1.111.2.3 24-May-2006  yamt sync with head.
 1.111.2.2 01-Apr-2006  yamt sync with head.
 1.111.2.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.112.2.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.114.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.114.2.2 19-May-2006  yamt UVM_MAPFLAG: add missing parens.
 1.114.2.1 19-May-2006  yamt file uvm_extern.h was added on branch chap-midi on 2006-05-19 15:08:15 +0000
 1.117.2.2 12-Jan-2007  ad Sync with head.
 1.117.2.1 18-Nov-2006  ad Sync with head.
 1.118.2.2 22-Oct-2006  yamt use workqueue for aiodoned.
 1.118.2.1 22-Oct-2006  yamt sync with head
 1.122.2.1 09-Dec-2006  bouyer Pull up following revision(s) (requested by elad in ticket #261):
sys/uvm/uvm_extern.h: revision 1.123
sys/uvm/uvm_swap.c: revision 1.115
share/man/man9/uvm.9: revision 1.79
Back out uvm_is_swap_device().
 1.125.2.3 15-Apr-2007  yamt sync with head.
 1.125.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.125.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.128.6.1 29-Mar-2007  reinoud Pullup to -current
 1.128.4.1 11-Jul-2007  mjf Sync with head.
 1.128.2.7 18-Oct-2007  ad Free uareas back to the uarea cache on the CPU where they were last used.
 1.128.2.6 21-Aug-2007  yamt fix some races around pagedaemon and uvm_wait. ok'ed by Andrew Doran.
 1.128.2.5 20-Aug-2007  ad Sync with HEAD.
 1.128.2.4 15-Jul-2007  ad Sync with head.
 1.128.2.3 09-Jun-2007  ad Sync with head.
 1.128.2.2 10-Apr-2007  ad Sync with head.
 1.128.2.1 05-Apr-2007  ad - Put a per-LWP lock around swapin / swapout.
- Replace use of lockmgr().
- Minor locking fixes and assertions.
- uvm_map.h no longer pulls in proc.h, etc.
- Use kpause where appropriate.
 1.132.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.132.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.134.6.2 27-Jul-2007  yamt ubc_uiomove: add an "advice" argument rather than using UVM_ADV_RANDOM blindly.
 1.134.6.1 27-Jul-2007  yamt file uvm_extern.h was added on branch matt-mips64 on 2007-07-27 09:50:38 +0000
 1.134.4.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.134.4.3 03-Dec-2007  joerg Sync with HEAD.
 1.134.4.2 06-Nov-2007  joerg Sync with HEAD.
 1.134.4.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.135.8.4 18-Feb-2008  mjf Sync with HEAD.
 1.135.8.3 27-Dec-2007  mjf Sync with HEAD.
 1.135.8.2 08-Dec-2007  mjf Sync with HEAD.
 1.135.8.1 19-Nov-2007  mjf Sync with HEAD.
 1.135.6.1 13-Nov-2007  bouyer Sync with HEAD
 1.135.2.3 23-Mar-2008  matt sync with HEAD
 1.135.2.2 09-Jan-2008  matt sync with HEAD
 1.135.2.1 06-Nov-2007  matt sync with HEAD
 1.137.2.3 26-Dec-2007  ad Sync with head.
 1.137.2.2 08-Dec-2007  ad Sync with head.
 1.137.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.139.4.2 02-Jan-2008  bouyer Sync with HEAD
 1.139.4.1 13-Dec-2007  bouyer Sync with HEAD
 1.139.2.2 13-Dec-2007  yamt sync with head.
 1.139.2.1 10-Dec-2007  yamt - separate kernel va allocation (kernel_va_arena) from
in-kernel fault handling (kernel_map).
- add vmem bootstrap code. vmem doesn't rely on malloc anymore.
- make kmem_alloc interrupt-safe.
- kill kmem_map. make malloc a wrapper of kmem_alloc.
 1.144.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.144.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.144.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.144.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.144.2.1 24-Mar-2008  keiichi sync with head.
 1.145.6.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.145.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.145.4.5 11-Aug-2010  yamt sync with head.
 1.145.4.4 11-Mar-2010  yamt sync with head
 1.145.4.3 19-Aug-2009  yamt sync with head.
 1.145.4.2 18-Jul-2009  yamt sync with head.
 1.145.4.1 04-May-2009  yamt sync with head.
 1.145.2.1 17-Jun-2008  yamt sync with head.
 1.146.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.146.4.1 19-Oct-2008  haad Sync with HEAD.
 1.146.2.1 18-Jul-2008  simonb Sync with head.
 1.148.4.2 01-Apr-2009  snj branches: 1.148.4.2.4;
Pull up following revision(s) (requested by mrg in ticket #622):
bin/csh/csh.1: revision 1.46
bin/csh/func.c: revision 1.37
bin/ps/print.c: revision 1.111
bin/ps/ps.c: revision 1.74
bin/sh/miscbltin.c: revision 1.38
bin/sh/sh.1: revision 1.92 via patch
external/bsd/top/dist/machine/m_netbsd.c: revision 1.7
lib/libkvm/kvm_proc.c: revision 1.82
sys/arch/mips/mips/cpu_exec.c: revision 1.55
sys/compat/darwin/darwin_exec.c: revision 1.57
sys/compat/ibcs2/ibcs2_exec.c: revision 1.73
sys/compat/irix/irix_resource.c: revision 1.15
sys/compat/linux/arch/amd64/linux_exec_machdep.c: revision 1.16
sys/compat/linux/arch/i386/linux_exec_machdep.c: revision 1.12
sys/compat/linux/common/linux_limit.h: revision 1.5
sys/compat/osf1/osf1_resource.c: revision 1.14
sys/compat/svr4/svr4_resource.c: revision 1.18
sys/compat/svr4_32/svr4_32_resource.c: revision 1.17
sys/kern/exec_subr.c: revision 1.62
sys/kern/init_sysctl.c: revision 1.160
sys/kern/kern_exec.c: revision 1.288
sys/kern/kern_resource.c: revision 1.151
sys/sys/param.h: patch
sys/sys/resource.h: revision 1.31
sys/sys/sysctl.h: revision 1.184
sys/uvm/uvm_extern.h: revision 1.153
sys/uvm/uvm_glue.c: revision 1.136
sys/uvm/uvm_mmap.c: revision 1.128
usr.bin/systat/ps.c: revision 1.32
- - add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.
- - adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.
- - add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)
- - patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)
- - patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.
- - update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)
this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.
tested on i386 and sparc64, build tested on several other platforms.
thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
 1.148.4.1 02-Nov-2008  snj Pull up following revision(s) (requested by tron in ticket #9):
sys/nfs/nfs_bio.c: revision 1.180
sys/miscfs/genfs/genfs_io.c: revision 1.14
sys/uvm/uvm_extern.h: revision 1.149
- allocate 8 pointers on the stack to avoid stack overflow in nfs.
- make that 8 a constant
- remove bogus panic
 1.148.4.2.4.6 12-Apr-2012  matt Separate object-less anon pages out of the active list if there is no swap
device. Make uvm_reclaimable and uvm.*estimatable understand colors and
kmem allocations.
 1.148.4.2.4.5 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.148.4.2.4.4 03-Jun-2011  matt Restore $NetBSD$
 1.148.4.2.4.3 03-Jun-2011  matt Rework page free lists to be sorted by color first rather than free_list.
Kept per color PGFL_* counter in each page free list.
Minor cleanups.
 1.148.4.2.4.2 25-May-2011  matt Make uvm_map recognize UVM_FLAG_COLORMATCH which tells uvm_map that the
'align' argument specifies the starting color of the KVA range to be returned.

When calling uvm_km_alloc with UVM_KMF_VAONLY, also specify the starting
color of the kva range returned (UMV_KMF_COLORMATCH) and pass those to
uvm_map.

In uvm_pglistalloc, make sure the pages being returned have sequentially
advancing colors (so they can be mapped in a contiguous address range).
Add a few missing UVM_FLAG_COLORMATCH flags to uvm_pagealloc calls.

Make the socket and pipe loan color-safe.

Make the mips pmap enforce strict page color (color(VA) == color(PA)).
 1.148.4.2.4.1 26-Jan-2010  matt Pass hints to uvm_pagealloc* to get it to use the right page color rather
than guess the right page color.
 1.148.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.148.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.148.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.150.4.2 23-Jul-2009  jym Sync with HEAD.
 1.150.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.161.2.13 18-Nov-2010  uebayasi Make XIP pager use cdev_mmap() instead of struct vm_physseg.
 1.161.2.12 16-Nov-2010  uebayasi Factor out the part which lookups physical page "identity" from
UVM object, into sys/uvm/uvm_vnode.c:uvn_findpage_xip(). Eventually
this will become a call to cdev UVM object pager.
 1.161.2.11 15-Nov-2010  uebayasi Hide uvm/uvm_page.h here again.
 1.161.2.10 02-Nov-2010  uebayasi Drop the 'paddr_t avail_start' and 'paddr_t avail_end' arguments
from uvm_page_physload_device(9).

Those two arguments are used by uvm_page_physload(9) to specify a
range of physical memory available for general purpose pages (pages
which are linked to freelists). Totally irrelevant to device
segments.
 1.161.2.9 30-Oct-2010  uebayasi Put back #include <uvm/uvm_page.h> for now, to avoid build erros.

This should be removed again later, because exposing page-level
definitions out of UVM is totally unnecessary.
 1.161.2.8 26-Jul-2010  uebayasi After much consideration, rename bus_space_physload_direct(9) back to
bus_space_physload_device(9).

The latter registers a segment as "device pages". "Device pages" are
managed, but not used for general purpose memory. Most typically XIP
pages.
 1.161.2.7 31-May-2010  uebayasi Re-define the definition of "device page"; device pages are pages of
device memory. Pages which don't have vm_page (== can't be used for
generic use), but whose PV are tracked, are called "direct pages" from
now.
 1.161.2.6 30-Apr-2010  uebayasi Sync with HEAD.
 1.161.2.5 29-Apr-2010  uebayasi "int free_list" (VM_FREELIST_*) is specific to struct vm_page (memory
page). Handle it only in memory physseg parts.

Record device page's properties in struct vm_physseg for future uses.
For example, framebuffers that is capable of some accelarated bus access
(e.g. write-combining) should register its capability through "int
flags".
 1.161.2.4 28-Apr-2010  uebayasi Initial support of uvm_page_physunload(9) and uvm_page_physunload_device(9).
Note that callers of these functions are responsible to ensure that the
segment is not used.
 1.161.2.3 28-Apr-2010  uebayasi Don't expose uvm_page.h internal for usual uvm(9) users.
 1.161.2.2 27-Apr-2010  uebayasi Forgotten to check this in; now uvm_page_physload() and
uvm_page_physload_device() returns struct vm_physseg * (which is not
used yet).
 1.161.2.1 23-Feb-2010  uebayasi Introduce uvm_page_physload_device(). This registers a physical address
range of a device, similar to uvm_page_physload() for memories. For now,
this is supposed to be called by MD code. We have to consider the design
when we'll manage mmap'able character devices.

Expose paddr_t -> struct vm_page * conversion function for device pages,
uvm_phys_to_vm_page_device(). This will be called by XIP vnode pager.
Because it knows if a given vnode is a device page (and its physical
address base) or not. Don't look up device segments, but directly make a
cookie.
 1.162.2.8 31-May-2011  rmind sync with head
 1.162.2.7 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.162.2.6 05-Mar-2011  rmind sync with head
 1.162.2.5 30-May-2010  rmind sync with head
 1.162.2.4 26-Apr-2010  rmind Add ubc_purge() and purge/deassociate any related UBC entries during
object (usually, vnode) destruction. Since locking (and thus object)
is required to enter/remove mappings - object is not allowed anymore
to disappear with any UBC entries left.

From original patch by ad@ with some modifications.
 1.162.2.3 23-Apr-2010  rmind Use consistent naming - uvm_obj_*().
 1.162.2.2 18-Mar-2010  rmind Unify /dev/{mem,kmem,zero,null} implementations in MI code. Based on patch
from Joerg Sonnenberger, proposed on tech-kern@, in February 2008.

Work and depression still in progress.
 1.162.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.168.4.3 05-Mar-2011  bouyer Sync with HEAD
 1.168.4.2 17-Feb-2011  bouyer Sync with HEAD
 1.168.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.168.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.172.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.176.6.3 29-Apr-2012  mrg sync to latest -current.
 1.176.6.2 05-Apr-2012  mrg sync to latest -current.
 1.176.6.1 18-Feb-2012  mrg merge to -current.
 1.176.2.11 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.176.2.10 30-Oct-2012  yamt sync with head
 1.176.2.9 17-Apr-2012  yamt sync with head
 1.176.2.8 05-Feb-2012  yamt turn vm.loanread sysctl to a threshold.
 1.176.2.7 11-Jan-2012  yamt create a sysctl knob to turn on/off loaned read.
 1.176.2.6 26-Dec-2011  yamt - use O->A loan to serve read(2). based on a patch from Chuck Silvers
- associated O->A loan fixes.
 1.176.2.5 20-Dec-2011  yamt don't inline uvn_findpages in genfs_io.
 1.176.2.4 20-Nov-2011  yamt - fix page loaning XXX make O->A loaning further
- add some statistics
 1.176.2.3 14-Nov-2011  yamt might dirty -> possibly dirty
suggested by wiz@
 1.176.2.2 12-Nov-2011  yamt redo the page clean/dirty/unknown accounting separately for file and
anonymous pages
 1.176.2.1 11-Nov-2011  yamt - track the number of clean/dirty/unknown pages in the system.
- g/c PG_MARKER
 1.181.2.1 12-Apr-2012  riz branches: 1.181.2.1.2;
Pull up following revision(s) (requested by martin in ticket #175):
sys/kern/kern_exit.c: revision 1.238
tests/lib/libc/gen/posix_spawn/t_fileactions.c: revision 1.4
tests/lib/libc/gen/posix_spawn/t_fileactions.c: revision 1.5
sys/uvm/uvm_extern.h: revision 1.183
lib/libc/gen/posix_spawn_fileactions.c: revision 1.2
sys/kern/kern_exec.c: revision 1.348
sys/kern/kern_exec.c: revision 1.349
sys/compat/netbsd32/syscalls.master: revision 1.95
sys/uvm/uvm_glue.c: revision 1.159
sys/uvm/uvm_map.c: revision 1.317
sys/compat/netbsd32/netbsd32.h: revision 1.95
sys/kern/exec_elf.c: revision 1.38
sys/sys/spawn.h: revision 1.2
sys/sys/exec.h: revision 1.135
sys/compat/netbsd32/netbsd32_execve.c: revision 1.34
Rework posix_spawn locking and memory management:
- always provide a vmspace for the new proc, initially borrowing from proc0
(this part fixes PR 46286)
- increase parallelism between parent and child if arguments allow this,
avoiding a potential deadlock on exec_lock
- add a new flag for userland to request old (lockstepped) behaviour for
better error reporting
- adapt test cases to the previous two and add a new variant to test the
diagnostics flag
- fix a few memory (and lock) leaks
- provide netbsd32 compat
Fix asynchronous posix_spawn child exit status (and test for it).
 1.181.2.1.2.1 28-Nov-2012  matt Pull from HEAD:
Add a __HAVE_CPU_UAREA_IDLELWP hook so that the MD code can allocate
special UAREAs for idle lwp's.
 1.184.4.1 18-May-2014  rmind sync with head
 1.184.2.2 03-Dec-2017  jdolecek update from HEAD
 1.184.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.189.2.1 10-Aug-2014  tls Rebase.
 1.191.4.7 28-Aug-2017  skrll Sync with HEAD
 1.191.4.6 05-Feb-2017  skrll Sync with HEAD
 1.191.4.5 05-Oct-2016  skrll Sync with HEAD
 1.191.4.4 29-May-2016  skrll Sync with HEAD
 1.191.4.3 19-Mar-2016  skrll Sync with HEAD
 1.191.4.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.191.4.1 06-Apr-2015  skrll Sync with HEAD
 1.191.2.2 25-Mar-2015  snj Pull up following revision(s) (requested by maxv in ticket #617):
sys/kern/kern_malloc.c: revision 1.144, 1.145
sys/kern/kern_pmf.c: revision 1.37
sys/rump/librump/rumpkern/rump.c: revision 1.316
sys/uvm/uvm_extern.h: revision 1.193
sys/uvm/uvm_km.c: revision 1.139
Don't include <uvm/uvm_extern.h>
--
Kill kmeminit().
--
Remove this MALLOC_DEFINE (M_PMF unused).
 1.191.2.1 31-Dec-2014  snj Pull up following revision(s) (requested by chs in ticket #363):
common/lib/libprop/prop_kern.c: revision 1.18
sys/arch/mac68k/dev/grf_compat.c: revision 1.27
sys/arch/x68k/dev/grf.c: revision 1.45
sys/external/bsd/drm/dist/bsd-core/drm_bufs.c: revision 1.12
sys/external/bsd/drm2/drm/drm_drv.c: revision 1.12
sys/external/bsd/drm2/drm/drm_vm.c: revision 1.6
sys/external/bsd/drm2/include/linux/mm.h: revision 1.4
sys/kern/vfs_vnops.c: revision 1.192 via patch
sys/rump/librump/rumpkern/vm.c: revision 1.160
sys/sys/file.h: revision 1.78 via patch
sys/uvm/uvm_device.c: revision 1.64
sys/uvm/uvm_device.h: revision 1.13
sys/uvm/uvm_extern.h: revision 1.192
sys/uvm/uvm_mmap.c: revision 1.150 via patch
add a new "fo_mmap" fileops method to allow use of arbitrary uvm_objects for
mappings of file objects. move vnode-specific details of mmap()ing a vnode
from uvm_mmap() to the new vnode-specific vn_mmap(). add new uvm_mmap_dev()
and uvm_mmap_anon() convenience functions for mapping character devices
and anonymous memory, and replace all other calls to uvm_mmap() with those.
use the new fileop in drm2 so that libdrm can use mmap() to map things
like on other platforms (instead of the ioctl that we have used so far).
 1.197.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.197.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.203.6.2 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.203.6.1 11-May-2017  pgoyette Sync with HEAD
 1.208.2.3 25-Jun-2018  pgoyette Sync with HEAD
 1.208.2.2 21-May-2018  pgoyette Sync with HEAD
 1.208.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.213.6.1 09-May-2025  martin Pull up following revision(s) (requested by riastradh in ticket #1947):

sys/uvm/uvm_extern.h: revision 1.234 (via patch)
sys/kern/kern_exec.c: revision 1.528 (via patch)
sys/uvm/uvm_map.c: revision 1.427 (via patch)

posix_spawn(2): Allocate a new vmspace at process creation time.

This allocates a new vmspace for the process at the time the new
process is created, rather than sharing some other vmspace temporarily.

This eliminates any risk of anything bad happening due to temporary
sharing, since there isn't any sharing.

Resolves a race to where:
1. we set up the child to share proc0.p_vmspace at first,
2. another process tries to read the new child's psstrings via
kern.proc_args.<childpid>.argv or similar with the child's
p_reflock held and gets stuck in a uvm fault loop because
proc0.p_vmspace doesn't have the child's psstrings address
(inherited from the parent) mapped,
3. the child is waiting for p_reflock before it can replace its
p_vmspace or psstrings.

By allocating the vmspace up front, with no mappings in it, we avoid
exposing the child in this scenario. Minor possible downside is that
sysctl kern.proc_args.<childpid>.argv might spuriously fail with
EFAULT during this time (rather than fail with EBUSY as it does if
p_reflock is held concurrently) but that's not a particularly big
deal.

Patch and first paragraph of commit message written by chs@; minor
tweaks to comments -- and any mistakes in the analysis -- by me.

PR kern/59037: deadlock in posix_spawn
PR kern/59175: posix_spawn hang, hanging other process too
 1.213.2.2 21-Apr-2020  martin Sync with HEAD
 1.213.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.218.2.2 29-Feb-2020  ad Sync with head.
 1.218.2.1 17-Jan-2020  ad Sync with head.
 1.222.2.2 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.222.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.231.8.1 31-May-2021  cjep sync with head
 1.231.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.232.12.1 09-May-2025  martin Pull up following revision(s) (requested by riastradh in ticket #1109):

sys/uvm/uvm_extern.h: revision 1.234
sys/kern/kern_exec.c: revision 1.528
sys/uvm/uvm_map.c: revision 1.427

posix_spawn(2): Allocate a new vmspace at process creation time.

This allocates a new vmspace for the process at the time the new
process is created, rather than sharing some other vmspace temporarily.

This eliminates any risk of anything bad happening due to temporary
sharing, since there isn't any sharing.

Resolves a race to where:
1. we set up the child to share proc0.p_vmspace at first,
2. another process tries to read the new child's psstrings via
kern.proc_args.<childpid>.argv or similar with the child's
p_reflock held and gets stuck in a uvm fault loop because
proc0.p_vmspace doesn't have the child's psstrings address
(inherited from the parent) mapped,
3. the child is waiting for p_reflock before it can replace its
p_vmspace or psstrings.

By allocating the vmspace up front, with no mappings in it, we avoid
exposing the child in this scenario. Minor possible downside is that
sysctl kern.proc_args.<childpid>.argv might spuriously fail with
EFAULT during this time (rather than fail with EBUSY as it does if
p_reflock is held concurrently) but that's not a particularly big
deal.

Patch and first paragraph of commit message written by chs@; minor
tweaks to comments -- and any mistakes in the analysis -- by me.

PR kern/59037: deadlock in posix_spawn
PR kern/59175: posix_spawn hang, hanging other process too
 1.233.6.1 02-Aug-2025  perseant Sync with HEAD
 1.237 15-Mar-2024  andvar Fix !VMSWAP build:
Added __unused for few local variables, which are used in VMSWAP block only.
Adjust !VMSWAP uvm_swap_stats() definition to make it build with compat code.
Copied "int (*uvm_swap_stats50)(...)" definition from uvm_swap to uvm_swapstub
to avoid missing uvm_swap_stats50 reference on linking.

Fixes INSTALL_CPMBR1400, INSTALL_ZYXELKX evbmips kernel configs as a result.

Reviewed by simon and phone in IRC (thanks).
 1.236 19-Sep-2023  ad Don't needlessly bump a couple of fault counters if upgrading the rwlock
failed.
 1.235 01-Sep-2023  andvar s/unnmapped/unmapped/ in comment.
 1.234 13-Aug-2023  chs uvm: prevent TLB invalidation races during COW resolution

When a thread takes a page fault which results in COW resolution,
other threads in the same process can be concurrently accessing that
same mapping on other CPUs. When the faulting thread updates the pmap
entry at the end of COW processing, the resulting TLB invalidations to
other CPUs are not done atomically, so another thread can write to the
new writable page and then a third thread might still read from the
old read-only page, resulting in inconsistent views of the page by the
latter two threads. Fix this by removing the pmap entry entirely for
the original page before we install the new pmap entry for the new
page, so that the new page can only be modified after the old page is
no longer accessible.

This fixes PR 56535 as well as the netbsd versions of problems
described in various bug trackers:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225584
https://reviews.freebsd.org/D14347
https://github.com/golang/go/issues/34988
 1.233 17-Jul-2023  riastradh uvm(9): One rndsource for faults -- not one per CPU.

All relevant state is per-CPU anyway; the only substantive difference
this makes is how many entries appear in `rndctl -l' output and what
they are called -- formerly the somewhat confusing `cpuN', meaning
`page faults on cpuN', and now just `uvmfault'. I don't think
there's any real value in being able to enable or disable measurement
or counting of page faults on one CPU vs others, so although this
could be a minor compatibility change, it's hard to imagine it
matters much.

XXX kernel ABI change in struct cpu_info
 1.232 09-Apr-2023  riastradh uvm(9): KASSERT(A && B) -> KASSERT(A); KASSERT(B)
 1.231 26-Oct-2022  riastradh branches: 1.231.2;
sys/kernel.h: New home for extern start_init_exec.
 1.230 03-Jun-2022  dholland typo in comment
 1.229 05-Dec-2021  msaitoh s/recusive/recursive/ in comment.
 1.228 09-Jul-2020  skrll Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
 1.227 17-May-2020  ad Start trying to reduce cache misses on vm_page during fault processing.

- Make PGO_LOCKED getpages imply PGO_NOBUSY and remove the latter. Mark
pages busy only when there's actually I/O to do.

- When doing COW on a uvm_object, don't mess with neighbouring pages. In
all likelyhood they're already entered.

- Don't mess with neighbouring VAs that have existing mappings as replacing
those mappings with same can be quite costly.

- Don't enqueue pages for neighbour faults unless not enqueued already, and
don't activate centre pages unless uvmpdpol says its useful.

Also:

- Make PGO_LOCKED getpages on UAOs work more like vnodes: do gang lookup in
the radix tree, and don't allocate new pages.

- Fix many assertion failures around faults/loans with tmpfs.
 1.226 15-May-2020  ad Reported-by: syzbot+3e3c7cfa8093f8de047e@syzkaller.appspotmail.com

Comment out an assertion that's now bogus and add a comment.
 1.225 13-Apr-2020  ad uvm_fault_check(): if MADV_SEQUENTIAL, change lower lock type to RW_WRITER
in case many threads are concurrently doing "sequential" access, to avoid
excessive mixing of read/write lock holds.
 1.224 23-Mar-2020  skrll branches: 1.224.2;
Fix UVMHIST build
 1.223 23-Mar-2020  skrll Trailing whitespace
 1.222 22-Mar-2020  ad Process concurrent page faults on individual uvm_objects / vm_amaps in
parallel, where the relevant pages are already in-core. Proposed on
tech-kern.

Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until
adjustments are made to their pmaps.
 1.221 20-Mar-2020  ad Go back to freeing struct vm_anon one by one. There may have been an
advantage circa ~2008 but there isn't now.
 1.220 20-Mar-2020  ad uvm_fault_upper_lookup(): don't call pmap_extract() and pmap_update() more
often than needed.
 1.219 17-Mar-2020  ad Tweak the March 14th change to make page waits interlocked by pg->interlock.
Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude
possible bugs.
 1.218 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.217 24-Feb-2020  rin 0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.
 1.216 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.215 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.214 31-Dec-2019  ad branches: 1.214.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.213 16-Dec-2019  ad - Extend the per-CPU counters matt@ did to include all of the hot counters
in UVM, excluding uvmexp.free, which needs special treatment and will be
done with a separate commit. Cuts system time for a build by 20-25% on
a 48 CPU machine w/DIAGNOSTIC.

- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
 1.212 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.211 01-Dec-2019  ad Deactivate pages in batch instead of acquiring uvm_pageqlock repeatedly.
 1.210 01-Dec-2019  martin Add missing <sys/atomic.h> include
 1.209 01-Dec-2019  maxv Use atomic_{load,store}_relaxed() on global counters.
 1.208 10-Nov-2019  chs in uvm_fault_lower_io(), fetch all the map entry values that we need
before we unlock everything.

Reported-by: syzbot+bb6f0092562222b489a3@syzkaller.appspotmail.com
 1.207 05-Aug-2019  chs fix two bugs reported in
https://syzkaller.appspot.com/bug?id=8840dce484094a926e1ec388ffb83acb2fa291c9

- in uvm_fault_check(), if the map entry is wired, handle the fault the same way
that we would handle UVM_FAULT_WIRE. faulting on wired mappings is valid
if the mapped object was truncated and then later grown again.

- in uvm_fault_unwire_locked(), we must hold the locks for the vm_map_entry
while calling pmap_extract() in order to avoid races with the mapped object
being truncated while we are unwiring it.

Reported-by: syzbot+2e0ae2fc35ab7301c7b8@syzkaller.appspotmail.com
 1.206 28-May-2019  msaitoh branches: 1.206.2;
s/recieve/receive/
 1.205 21-Apr-2019  chs If a pager fault method returns ENOMEM but some memory appears to be reclaimable,
wake up the pagedaemon and retry the fault. This fixes the problems with Xorg
being killed with an "out of swap" message due to a transient memory shortage.
 1.204 08-May-2018  christos branches: 1.204.2;
don't store the rssmax in the lwp rusage, it is a per proc property. Instead
utilize an unused field in the vmspace struct to store it. Also conditionalize
on platforms that have pmap statistics available.
 1.203 07-May-2018  christos update maxrss (used to always be 0). Patterned after the OpenBSD changes.
 1.202 20-Nov-2017  chs branches: 1.202.2;
In uvm_fault_upper_enter(), if pmap_enter(PMAP_CANFAIL) fails, assert that
the pmap did not leave around a now-stale pmap mapping for an old page.
If such a pmap mapping still existed after we unlocked the vm_map,
the UVM code would not know later that it would need to lock the
lower layer object while calling the pmap to remove or replace that
stale pmap mapping. See PR 52706 for further details.
 1.201 28-Oct-2017  pgoyette Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.200 09-Jul-2017  christos PR/52384: make uvm_fault_check() return EFAULT not EACCES, like our man pages
(but not OpenGroup which does not document EFAULT for read/write, and only
documents EACCES for sockets) say for read/write.
 1.199 20-Mar-2017  skrll branches: 1.199.6;
Ensure we pass the prot in flags to pmap_enter when creating a wired
mapping
 1.198 19-Mar-2017  riastradh __diagused police
 1.197 22-Jun-2015  matt branches: 1.197.2; 1.197.4;
Use %p, %#xl etc. for pointers and addresses.
 1.196 10-Aug-2014  tls branches: 1.196.4;
Merge tls-earlyentropy branch into HEAD.
 1.195 15-Sep-2013  martin branches: 1.195.2;
Mark a variable as potentially unused
 1.194 19-Feb-2012  rmind branches: 1.194.2; 1.194.4;
Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".
 1.193 02-Feb-2012  tls Entropy-pool implementation move and cleanup.

1) Move core entropy-pool code and source/sink/sample management code
to sys/kern from sys/dev.

2) Remove use of NRND as test for presence of entropy-pool code throughout
source tree.

3) Remove use of RND_ENABLED in device drivers as microoptimization to
avoid expensive operations on disabled entropy sources; make the
rnd_add calls do this directly so all callers benefit.

4) Fix bug in recent rnd_add_data()/rnd_add_uint32() changes that might
have lead to slight entropy overestimation for some sources.

5) Add new source types for environmental sensors, power sensors, VM
system events, and skew between clocks, with a sample implementation
for each.

ok releng to go in before the branch due to the difficulty of later
pullup (widespread #ifdef removal and moved files). Tested with release
builds on amd64 and evbarm and live testing on amd64.
 1.192 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.191 28-Nov-2011  yamt branches: 1.191.2;
comments
 1.190 06-Aug-2011  rmind branches: 1.190.2;
- Rework uvm_anfree() into uvm_anon_freelst(), which always drops the lock.
- Free anons in uvm_anon_freelst() without lock held.
- Mechanic sync to unused loaning code.
 1.189 05-Jul-2011  yamt reduce the number of atomic ops in common cases. it's exceptional for
anons to remain longer than amap.
 1.188 24-Jun-2011  rmind Fix uvmplock regression - a lock against oneself case in amap_swap_off().
Happens since amap is NULL in uvmfault_anonget(), so uvmfault_unlockall()
keeps anon locked, when it should unlock it.
 1.187 23-Jun-2011  rmind uvmfault_anonget: clean-up, improve some comments, misc.
 1.186 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.185 21-May-2011  tsutsui branches: 1.185.2;
No need to pass UVM_FLAG_COLORMATCH to uvm_pagealloc()
if no valid vaddr is specified.
 1.184 23-Apr-2011  rmind Replace "malloc" in comments, remove unnecessary header inclusions.
 1.183 08-Apr-2011  yamt - ensure that the promoted page is on the queue even when later pmap_enter
failed.
- don't activate a page twice.
- remove an argument which is used only for an assertion.
- assertions and comments.
 1.182 10-Feb-2011  skrll Spell uvm_fault_lower_neighbor correctly in UVMHIST_FUNC by using
__func__
 1.181 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.180 06-Jan-2011  enami branches: 1.180.2; 1.180.4;
Fix bugs introduced by previous commit; allocated page needs to be bound
with the anon, and uvmfault_anonget may be called with ufi NULL.
 1.179 04-Jan-2011  matt Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.
 1.178 20-Dec-2010  matt Move counting of faults, traps, intrs, soft[intr]s, syscalls, and nswtch
from uvmexp to per-cpu cpu_data and move them to 64bits. Remove unneeded
includes of <uvm/uvm_extern.h> and/or <uvm/uvm.h>.
 1.177 17-Dec-2010  yamt cosmetics. no functional changes.
- constify
- wrap long lines
- assertions
- comments
 1.176 15-Dec-2010  pooka Remove duplicate asserts from when uvm_fault_lower1() was merged
into uvm_fault_lower() (the duplicates were there already before,
just in different functions).

reported by Alexander Nasonov on tech-kern
 1.175 22-Jun-2010  rmind Keep the lock around pmap_update() where required. While fixing this
in ubc_fault(), rework logic to "remember" the last object of page and
reduce locking overhead, since in common case pages belong to one and
the same UVM object (but not always, therefore add a comment).

Unlocks before pmap_update(), on removal of mappings, might cause TLB
coherency issues, since on architectures like x86 and mips64 invalidation
IPIs are deferred to pmap_update(). Hence, VA space might be globally
visible before IPIs are sent or while they are still in-flight.

OK ad@.
 1.174 28-May-2010  rmind uvm_fault_{upper,lower}_done: move drop-swap outside the page-queues lock.
Assert for object lock being held (or ref count 0) in uao_set_swslot().
 1.173 24-Feb-2010  uebayasi branches: 1.173.2;
Merge more indirect functions. Some comments.
 1.172 24-Feb-2010  uebayasi uvm_fault_upper_lookup, uvm_fault_upper_neighbor: There is no point to call
pmap_update() without calling pmap_enter().

(Probably calling only once after loop (as done in uvm_fault_lower_lookup())
is enough. If done so, other threads see entered neighbor pages as reflected
a little latter.)
 1.171 24-Feb-2010  uebayasi Minor clean up.
 1.170 24-Feb-2010  uebayasi Revert a thinko.
 1.169 24-Feb-2010  uebayasi Slightly clean up uvm_fault() code path after pmap_enter(). Now tasks
needed for page cache are concentrated in own functions (uvm_fault_*_done()).
 1.168 24-Feb-2010  uebayasi Record if "promote" is done in UVMHIST. Do it for "upper" fault too.
 1.167 24-Feb-2010  uebayasi Merge some indirect "lower" fault handlers back. Prompted by rmind@.
 1.166 08-Feb-2010  mlelstv branches: 1.166.2;
pgo_get needs the page array to be initialized.
 1.165 08-Feb-2010  mlelstv Move assertion to make check more clear.
 1.164 07-Feb-2010  mlelstv Make UVMHIST build again.
 1.163 05-Feb-2010  uebayasi Cosmetic. Shorten some long names.
 1.162 05-Feb-2010  uebayasi Fix !DIAGNOSTIC build. Reported by Geoff Wing.
 1.161 04-Feb-2010  uebayasi Reduce diff between upper/lower neighbor handlers.
 1.160 04-Feb-2010  uebayasi Merge "obfuscating layers" for readability. Inline some functions.
Requested by rmind@.
 1.159 04-Feb-2010  uebayasi Move uvm_fault_* static func decls in one place.
 1.158 03-Feb-2010  uebayasi uvm_fault_lower_generic_io: Reduce diff from uvm_loanuobj().
 1.157 03-Feb-2010  uebayasi uvm_fault_lower_generic_io: One missing mutex_exit(vmobjlock). Found while
comparing this function with uvm_loanuobj(). (Part of) these should be
merged.
 1.156 02-Feb-2010  uebayasi uobj->pgops->pgo_get doing PGO_SYNCIO returns a uobjpage whose uobj backpointer
refers to another "uobj" used to call pgo_get. Revert the wrong assertion
I made. My bad.

(This and pgo_get's possible ERESTART return value check is the only 2 behavioral
changes I made.)

Reported by drochner@, thanks.
 1.155 02-Feb-2010  uebayasi Don't pass an unnecessary reference to uvm_loanbreak_anon().

Requested by rmind@.
 1.154 02-Feb-2010  uebayasi Be consistent to decide if PMAP_WIRED or not.
 1.153 02-Feb-2010  uebayasi Move A->K loan break code to uvm_loan.c.
 1.152 02-Feb-2010  uebayasi Indent.
 1.151 02-Feb-2010  uebayasi uvm_fault: Split "neighbor" fault and loan handling into functions.
 1.150 02-Feb-2010  uebayasi Sort struct uvm_faultctx members for better alignment.
 1.149 01-Feb-2010  uebayasi Indent.
 1.148 01-Feb-2010  uebayasi More split.
 1.147 01-Feb-2010  uebayasi Fix build without DIAGNOSTIC.
 1.146 01-Feb-2010  uebayasi uvm_fault: Clarify when to wire what.
 1.145 01-Feb-2010  uebayasi uvm_fault_upper_lookup: This is totally my personal preference, but can't help
adding one goto to reduce one indent.
 1.144 01-Feb-2010  uebayasi uvm_fault:
- Lower fault routines don't care the vm_anon array found in upper lookup.
Don't pass the pointer down.
- The flag "shadowed" is known when we lookup upper layer. Don't need to
keep in the fault context struct.
 1.143 01-Feb-2010  uebayasi Indent.
 1.142 01-Feb-2010  uebayasi Rewrite uvm_fault() loop using while () than goto.
 1.141 01-Feb-2010  uebayasi Split uvm_fault() into 2 more functions, uvm_fault_check() and
uvm_fault_upper_lookup(). Omit unnecessary arguments passed around.
 1.140 01-Feb-2010  uebayasi uvm_fault: Pack variables shared during fault / re-fault into a struct named
uvm_faultctx. Unfortunately ~all of those values are overriden in various
ways. Constification doesn't help much...
 1.139 01-Feb-2010  uebayasi ERESTART is already negative. Give up negating error values to not override
the original values. Pointed out by rmind@, thanks.

In the lower fault case, if (*pgo_get)() can return ERESTART and we should
re-fault for that remains a question. The original code just returned the
error, so keep that behaviour for now. In case (*pgo_get)() really returns
ERESTART, pass EIO to tell the uvm_fault caller that (*pgo_get)() failed.

(As far as I grep callers don't check if the return value is ERESTART or not.
So assuming (*pgo_get)() never returns ERESTART should be a safe bet.)
 1.138 31-Jan-2010  uebayasi Ax uvm_fault_internal() & break it into functions. "Upper" fault and "lower"
fault routines are separated now.
 1.137 31-Jan-2010  uebayasi uvm_fault_internal:

Move local variables around to isolate contexts. Note that remaining variables
are global in that function, and some hold state across re-fault.

Slilently clean-up the "eoff" mess.

(Superfluous braces will go once things settle down.)
 1.136 31-Jan-2010  uebayasi Indent.
 1.135 31-Jan-2010  uebayasi uvm_fault_internal: In lower fault handling case, put another goto to clarify
that we don't care lower neighboring pages for the zero-fill object.
 1.134 31-Jan-2010  uebayasi uvm_fault_internal: Skip another long code segment (lower "neighbor" fault)
by a goto.
 1.133 31-Jan-2010  uebayasi uvm_fault_internal: Put a goto label "Case1" as well as "Case2". Clarify
that if the faulting page is shadowed, we don't care the lower layer at all.
 1.132 31-Jan-2010  uebayasi Correct previous; fix a miscalculation of offset-into-entry in MADV_SEQUENTIAL
case. Pointed out by pooka@.
 1.131 30-Jan-2010  uebayasi Calculate the offset from vm_map_entry's start to vm_page array's start once.
 1.130 24-Jan-2010  uebayasi Clean up an internal flag usage. No functional changes.
 1.129 17-Dec-2009  rmind Replace few USER_TO_UAREA/UAREA_TO_USER uses, reduce sys/user.h inclusions.
 1.128 05-Dec-2009  pooka Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.
 1.127 01-Nov-2009  uebayasi Consistently call amap / uobj layers as upper / lower, because UVM has only
those two layers by design. Approved by Chuck Cranor some time ago.
 1.126 20-Dec-2008  ad Move a couple of calls to pmap_update().
 1.125 04-Jul-2008  ad branches: 1.125.4; 1.125.6;
Update a comment.
 1.124 27-Mar-2008  ad branches: 1.124.4; 1.124.6; 1.124.8;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.123 18-Jan-2008  yamt branches: 1.123.6;
push pmap_clear_reference calls into pdpolicy code, where reference bits
actually matter.
 1.122 02-Jan-2008  ad Merge vmlocking2 to head.
 1.121 11-Oct-2007  ad branches: 1.121.4; 1.121.6; 1.121.10;
Remove LOCK_ASSERT(!simple_lock_held(&foo));
 1.120 21-Jul-2007  ad branches: 1.120.4; 1.120.6; 1.120.8; 1.120.10;
Merge unobtrusive locking changes from the vmlocking branch.
 1.119 22-Feb-2007  thorpej branches: 1.119.4; 1.119.12;
TRUE -> true, FALSE -> false
 1.118 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.117 15-Dec-2006  yamt branches: 1.117.2;
put ->K loaned pages on the page queue, so that page loaning doesn't
disturb pagedaemon/pdpolicy.
 1.116 01-Dec-2006  yamt uvm_fault: fix an assertion. PR/35134 from Christos Zoulas.
it can be triggered by minherit as well.
 1.115 28-Nov-2006  yamt uvm_fault: unwrap a short line.
 1.114 12-Oct-2006  yamt move some knowledge about vnode into uvm_vnode.c.
 1.113 03-Oct-2006  christos Coverity CID 3170,3171: Add KASSERT.
 1.112 15-Sep-2006  yamt branches: 1.112.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.111 11-Apr-2006  yamt branches: 1.111.8;
add assertions.
 1.110 15-Mar-2006  drochner branches: 1.110.2;
-clean up the interface to uvm_fault: the "fault type" didn't serve
any purpose (done by a macro, so we don't save any cycles for now)
-kill vm_fault_t; it is not needed for real faults, and for simulated
faults (wiring) it can be replaced by UVM internal flags
-remove <uvm/uvm_fault.h> from uvm_extern.h again
 1.109 22-Feb-2006  drochner branches: 1.109.2; 1.109.4;
kill the "fault_type" argument to pager's pgo_fault() methods
it is never used
(and using it would comprise an abstraction violation imho)
 1.108 15-Feb-2006  yamt - amap_copy: take a "flags" argument instead of booleans.
- add AMAP_COPY_NOMERGE flag, and use it for uvm_map_extract.
PR/32806 from Julio M. Merino Vidal.
 1.107 31-Jan-2006  yamt branches: 1.107.2; 1.107.4;
handle "strange" filesystems like layered filesystems and tmpfs,
where pgo_get returns pages which don't belong to the uobj.
also fix an XXX in uvm_loananon and lock-unlock mismatch in uvm_loanuobj.

PR/28372, PR/32665 (Alan Barrett).
 1.106 31-Jan-2006  yamt re-apply uvm_fault.c 1.104. fixes will follow.
 1.105 30-Jan-2006  yamt revert uvm_fault.c 1.104 for now. see PR/28372, PR/32665.
 1.104 21-Jan-2006  yamt - uvm_fault: move a common code of 1B and 2B to a new function.
don't attempt to allocate anons with kernel_map locked. PR/32543.
- amap_copy: add an assertion.
 1.103 24-Dec-2005  perry branches: 1.103.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.102 11-Dec-2005  christos merge ktrace-lwp.
 1.101 13-Sep-2005  yamt wrap swap related code by #ifdef VMSWAP. always #define VMSWAP for now.
 1.100 31-Jul-2005  yamt revert "defflag VMSWAP" changes for now.
there seems to be far more people who don't want to edit
their kernel config files than i thought.
 1.99 30-Jul-2005  yamt defflag VMSWAP.
 1.98 23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.97 22-Jul-2005  yamt uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.
 1.96 17-Jul-2005  yamt ensure that vnodes with dirty pages are always on syncer's queue.

- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).

- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.

fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)

- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).

- add some assertions.
 1.95 27-Jun-2005  thorpej branches: 1.95.2;
Use ANSI function decls.
 1.94 11-May-2005  yamt allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.93 27-Apr-2005  yamt uvmfault_anonget: check uvm_reclaimable() where appropriate.
 1.92 12-Apr-2005  yamt fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.
 1.91 28-Feb-2005  chs branches: 1.91.2;
use TRUE and FALSE instead of 1 and 0 for boolean_t.
 1.90 07-Feb-2005  yamt uvm_fault: fix integer overflow so that MADV_SEQUENTIAL
can work on large files.
 1.89 01-Jan-2005  yamt branches: 1.89.2; 1.89.4;
uvm_fault: pass NULL pap to pmap_extract where we don't need paddr.
 1.88 05-May-2004  yamt fix a amap_wirerange deadlock problem by re-introducing
PG_RELEASED for anon pages. PR/23171 from Christian Limpach.
for details, see discussion filed in the PR database.

uvm_anon_release: a new function to free anon-owned PG_RELEASED page.
uvm_anfree: we can't wait for the page here because the caller might hold
amap lock. instead, just mark the page as PG_RELEASED.
who unbusy the page should check the PG_RELEASED.
uvm_aio_aiodone: uvm_anon_release() instead of uvm_page_unbusy()
if appropriate.
uvmfault_anonget: check PG_RELEASED.
 1.87 24-Mar-2004  junyoung branches: 1.87.2;
Nuke __P().
 1.86 02-Mar-2004  yamt uvm_fault: check loan_count of neighborhood object page properly.

PR/24595 from Stephan Uphoff.
 1.85 10-Feb-2004  dbj s/fauling/faulting/
 1.84 11-Aug-2003  pk Make sure to call uvm_swap_free() and uvm_swap_markbad() with valid (i.e.
positive) slot numbers.
 1.83 11-Aug-2003  pk Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.
 1.82 03-May-2003  yamt branches: 1.82.2;
use uvm_loanbreak in uvm_fault.
 1.81 09-Feb-2003  pk uvm_fault: case 1B: lock page queue before calling uvm_pageactivate().
 1.80 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.79 30-Oct-2002  yamt change "uoff" to voff_t from vaddr_t as it's offset within uvm object.

fix PR/18855.
 1.78 02-Sep-2002  thorpej When breaking an loan due to a page fault, check to see if the other
kind of reference-holder (anon or object) is referencing the page. If
not, then the page must be removed from the pageq's.

Reviewed by Chuck Silvers.
 1.77 29-Aug-2002  chs be sure that the page we allocate to break a loan is put on a paging queue.
fixes PR 18037.
 1.76 25-Mar-2002  chs branches: 1.76.2; 1.76.4;
when processing PG_RDONLY, mask off VM_PROT_WRITE instead of hard-wiring
VM_PROT_READ (since we might have VM_PROT_EXEC too). this fixes problems
running binaries out of NFS on macppc. yet another fix courtesy of enami.
 1.75 09-Mar-2002  chs a vm_prot_t is a bit-mask, fix an assertion which was treating one
more like an enumerated type.
 1.74 02-Jan-2002  chs in uvm_fault_unwire_locked(), if we find that a pmap entry is missing,
just skip that page. this situation can arise legitimately when a file
with a wired mapping is truncated so that a wired page is no longer
part of the file.
 1.73 01-Jan-2002  chs redo part of the last commit.
 1.72 31-Dec-2001  chs introduce a new UVM fault type, VM_FAULT_WIREMAX. this is different
from VM_FAULT_WIRE in that when the pages being wired are faulted in,
the simulated fault is at the maximum protection allowed for the mapping
instead of the current protection. use this in uvm_map_pageable{,_all}()
to fix the problem where writing via ptrace() to shared libraries that
are also mapped with wired mappings in another process causes a
diagnostic panic when the wired mapping is removed.

this is a really obscure problem so it deserves some more explanation.
ptrace() writing to another process ends up down in uvm_map_extract(),
which for MAP_PRIVATE mappings (such as shared libraries) will cause
the amap to be copied or created. then the amap is made shared
(ie. the AMAP_SHARED flag is set) between the kernel and the ptrace()d
process so that the kernel can modify pages in the amap and have the
ptrace()d process see the changes. then when the page being modified
is actually faulted on, the object pages (from the shared library vnode)
is copied to a new anon page and inserted into the shared amap.
to make all the processes sharing the amap actually see the new anon
page instead of the vnode page that was there before, we need to
invalidate all the pmap-level mappings of the vnode page in the pmaps
of the processes sharing the amap, but we don't have a good way of
doing this. the amap doesn't keep track of the vm_maps which map it.
so all we can do at this point is to remove all the mappings of the
page with pmap_page_protect(), but this has the unfortunate side-effect
of removing wired mappings as well. removing wired mappings with
pmap_page_protect() is a legitimate operation, it can happen when a file
with a wired mapping is truncated. so the pmap has no way of knowing
whether a request to remove a wired mapping is normal or when it's due to
this weird situation. so the pmap has to remove the weird mapping.
the process being ptrace()d goes away and life continues. then,
much later when we go to unwire or remove the wired vm_map mapping,
we discover that the pmap mapping has been removed when it should
still be there, and we panic.

so where did we go wrong? the problem is that we don't have any way
to update just the pmap mappings that need to be updated in this
scenario. we could invent a mechanism to do this, but that is much
more complicated than this change and it doesn't seem like the right
way to go in the long run either.

the real underlying problem here is that wired pmap mappings just
aren't a good concept. one of the original properties of the pmap
design was supposed to be that all the information in the pmap could
be thrown away at any time and the VM system could regenerate it all
through fault processing, but wired pmap mappings don't allow that.
a better design for UVM would not require wired pmap mappings,
and Chuck C. and I are talking about this, but it won't be done
anytime soon, so this change will do for now.

this change has the effect of causing MAP_PRIVATE mappings to be
copied to anonymous memory when they are mlock()d, so that uvm_fault()
doesn't need to copy these pages later when called from ptrace(), thus
avoiding the call to pmap_page_protect() and the panic that results
from this when the mlock()d region is unlocked or freed. note that
this change doesn't help the case where the wired mapping is MAP_SHARED.

discussed at great length with Chuck Cranor.
fixes PRs 10363, 12554, 12604, 13041, 13487, 14580 and 14853.
 1.71 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.70 03-Oct-2001  chs branches: 1.70.2;
skip the MADV_SEQUENTIAL processing if we refault. fixes PR 14060.
 1.69 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.68 10-Sep-2001  chris Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.
 1.67 26-Jun-2001  thorpej branches: 1.67.2; 1.67.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.
 1.66 26-Jun-2001  thorpej Note that uvm_fault() must NEVER EVER EVER be called in interrupt
context.
 1.65 14-Jun-2001  chs work around an overflow problem in uvm_fault_wire().
from Eduardo Horvath and Simon Burge.
 1.64 02-Jun-2001  chs replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.63 25-May-2001  chs remove trailing whitespace.
 1.62 25-Apr-2001  thorpej Add a comment describing a problem.
 1.61 24-Apr-2001  thorpej Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.
 1.60 01-Apr-2001  chs undo the part of a previous commit which turned a check for faulting
on an "intrsafe" map into a KASSERT. this situation can be caused by
an application accessing /dev/kmem.
 1.59 17-Mar-2001  chs return the real error from pgo_fault().
 1.58 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.57 10-Mar-2001  chs eliminate the VM_PAGER_* error codes in favor of the traditional E* codes.
the mapping is:

VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART

for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
 1.56 18-Feb-2001  chs branches: 1.56.2;
clean up DIAGNOSTIC checks, use KASSERT().
 1.55 28-Jan-2001  thorpej Page scanner improvements, behavior is actually a bit more like
Mach VM's now. Specific changes:
- Pages now need not have all of their mappings removed before being
put on the inactive list. They only need to have the "referenced"
attribute cleared. This makes putting pages onto the inactive list
much more efficient. In order to eliminate redundant clearings of
"refrenced", callers of uvm_pagedeactivate() must now do this
themselves.
- When checking the "modified" attribute for a page (for clearing
PG_CLEAN), make sure to only do it if PG_CLEAN is currently set on
the page (saves a potentially expensive pmap operation).
- When scanning the inactive list, if a page is referenced, reactivate
it (this part was actually added in uvm_pdaemon.c,v 1.27). This
now works properly now that pages on the inactive list are allowed to
have mappings.
- When scanning the inactive list and considering a page for freeing,
remove all mappings, and then check the "modified" attribute if the
page is marked PG_CLEAN.
- When scanning the active list, if the page was referenced since its
last sweep by the scanner, don't deactivate it. (This part was
actually added in uvm_pdaemon.c,v 1.28.)

These changes greatly improve interactive performance during
moderate to high memory and I/O load.
 1.54 23-Jan-2001  thorpej Change uvm_analloc() to return a locked anon, update all callers,
and fix an anon locking protocol error in uvm_loanzero().
 1.53 23-Jan-2001  thorpej Sprinkle some assertions:
amap_free(): Assert that the amap is locked.
amap_share_protect(): Assert that the amap is locked.
amap_wipeout(): Assert that the amap is locked.
uvm_anfree(): Assert that the anon has a reference count of 0 and is
not locked.
uvm_anon_lockloanpg(): Assert that the anon is locked.
anon_pagein(): Assert that the anon is locked.
uvmfault_anonget(): Assert that the anon is locked.
uvm_pagealloc_strat(): Assert that the uobj or the anon is locked

And fix the problems these have uncovered:
amap_cow_now(): Lock the new anon after allocating it, and unref and
unlock it (rather than lock!) before freeing it in case
of an error condition. This should fix a problem reported
by Dan Carosone using cdrecord on an i386 MP kernel.
uvm_fault(): Case1B -- Lock the new anon afer allocating it, and unlock
it later when we unlock the old anon.
Case2 -- Lock the new anon after allocating it, and unlock
it later by passing it to uvmfault_unlockall() (we set anon
to NULL if we're not doing a promote fault).
 1.52 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.51 06-Aug-2000  thorpej Update a comment in uvmfault_anonget() to reflect reality, and
make uvm_fault() handle uvmfault_anonget() failure properly (i.e.
don't unlock a lock that's already unlocked).
 1.50 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.49 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.48 10-Apr-2000  thorpej branches: 1.48.4;
Use UVM_PGA_ZERO in the promote-zero-fault case of uvm_fault().
 1.47 11-Jan-2000  chs add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.46 13-Nov-1999  thorpej Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.
 1.45 12-Sep-1999  chs branches: 1.45.2; 1.45.4; 1.45.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.
 1.44 22-Jul-1999  thorpej Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.43 19-Jul-1999  cgd make sure 'wide' fault handling is actually done only once per fault.
('narrow' was mistakenly set to FALSE instead of TRUE.) Committed after
discussion with chuq.
 1.42 11-Jul-1999  thorpej Back out the change I made yesterday. It seems to cause some trouble
for some folks.
 1.41 10-Jul-1999  thorpej Simplify uvm_fault_unwire_locked() a little.
 1.40 08-Jul-1999  thorpej Change the pmap_extract() interface to:
boolean_t pmap_extract(pmap_t, vaddr_t, paddr_t *);
This makes it possible for the pmap to map physical address 0.
 1.39 17-Jun-1999  thorpej pmap_change_wiring() -> pmap_unwire().
 1.38 17-Jun-1999  thorpej Remove pmap_pageable(); no pmap implements it, and it is not really useful,
because pmap_enter()/pmap_change_wiring() (soon to be pmap_unwire())
communicate the information in greater detail.
 1.37 16-Jun-1999  thorpej When unwiring a range in uvm_fault_unwire_locked(), don't call
pmap_change_wiring(...,FALSE) unless the map entry claims the address
is unwired. This fixes the following scenario, as described on
tech-kern@netbsd.org on Wed 6/16/1999 12:25:23:

- User mlock(2)'s a buffer, to guarantee it will never become
non-resident while he is using it.

- User then does physio to that buffer. Physio calls uvm_vslock()
to lock down the pages and ensure that page faults do not happen
while the I/O is in progress (possibly in interrupt context).

- Physio does the I/O.

- Physio calls uvm_vsunlock(). This calls uvm_fault_unwire().

>>> HERE IS WHERE THE PROBLEM OCCURS <<<

uvm_fault_unwire() calls pmap_change_wiring(..., FALSE),
which now gives the pmap free reign to recycle the mapping
information for that page, which is illegal; the mapping is
still wired (due to the mlock(2)), but now access of the
page could cause a non-protection page fault (disallowed).

NOTE: This could eventually lead to a panic when the user
subsequently munlock(2)'s the buffer and the mapping info
has been recycled for use by another mapping!
 1.36 16-Jun-1999  thorpej * Rename uvm_fault_unwire() to uvm_fault_unwire_locked(), and require that
the map be at least read-locked to call this function. This requirement
will be taken advantage of in a future commit.
* Write a uvm_fault_unwire() wrapper which read-locks the map and calls
uvm_fault_unwire_locked().
* Update the comments describing the locking contraints of uvm_fault_wire()
and uvm_fault_unwire().
 1.35 16-Jun-1999  thorpej Remove a incorrect-and-no-longer-relevant comment.
 1.34 16-Jun-1999  thorpej Add a macro to test if a map entry is wired.
 1.33 04-Jun-1999  thorpej Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.
 1.32 02-Jun-1999  thorpej A page fault on a non-pageable map is always fatal.
 1.31 28-May-1999  thorpej Make uvm_fault_unwire() take a vm_map_t, rather than a pmap_t, for
consistency. Use this opportunity for checking for intrsafe map use
in this routine (which is illegal).
 1.30 26-May-1999  thorpej Pass an access_type to uvm_fault_wire(), which it forwards on to
uvm_fault().
 1.29 19-May-1999  chs when wiring swap-backed pages, clear the PG_CLEAN flag before
releasing any swap resources. if we don't do this, we can
end up with a clean, swap-backed page, which is illegal.
tracked down by Bill Sommerfeld, fixes PR 7578.
 1.28 11-Apr-1999  chs add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.27 29-Mar-1999  mycroft branches: 1.27.2;
Duuuh. Back and front pages should have an access_type of 0, since we don't
know they're going to be used. What was I thinking??
 1.26 28-Mar-1999  mycroft Reduce the access_type for copy-on-write pages in the front and back regions.
 1.25 28-Mar-1999  mycroft Fix a case I missed in the previous.
 1.24 28-Mar-1999  mycroft Only turn off VM_PROT_WRITE for COW pages; not VM_PROT_EXECUTE.
 1.23 26-Mar-1999  mycroft Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().
 1.22 26-Mar-1999  chs add uvmexp.swpgonly and use it to detect out-of-swap conditions.
 1.21 25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.20 31-Jan-1999  mrg 80 cols.
 1.19 24-Jan-1999  chuck cleanup/reorg:
- break anon related functions out of uvm_amap.c and put them in their own
file (uvm_anon.c). includes break up uvm_anon_init into an amap and an
an anon init function
- ensure that only functions within the amap module access amap structure
fields (add macros to amap api as needed)
 1.18 20-Nov-1998  chuck update outdated an_swslot comments
 1.17 07-Nov-1998  mrg branches: 1.17.2;
minor KNF nits
 1.16 04-Nov-1998  chs be consistent with locking of amaps and anons when freeing them.
 1.15 18-Oct-1998  chs shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.
 1.14 16-Oct-1998  tv Check for gcc the Right way when quashing -Wuninitialized goop.
 1.13 11-Oct-1998  chuck remove unused share map code from UVM:
- simplify uvm_faultinfo in uvm_fault.h (parent map tracking no longer needed)
- adjust locking and lookup functions in uvm_fault_i.h to reflect the above
- replace ufi.rvaddr with ufi.orig_rvaddr in uvm_fault.c since rvaddr is
no longer needed.
- no need to worry about share map translations in uvm_fault(). simplify.
 1.12 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.11 02-Jun-1998  mark branches: 1.11.2;
Use the sparc's GCC lossage fix for the arm32 port as well. Problem appears
to be a compiler bug resulting in an 'variable possibly used uninitialised'
warning when optimisation is used.
 1.10 05-May-1998  kleink Remove inclusions of syscall (and syscall argument) related header files;
we don't need them here.
 1.9 26-Mar-1998  chuck update per-process rusage fault counters (ru_majflt/ru_minflt) under UVM
 1.8 22-Mar-1998  chuck remove tmpwire arg from uvm_pagewire() -- it isn't needed anymore.
noted by chuck s.
 1.7 09-Mar-1998  mrg KNF.
 1.6 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.5 07-Feb-1998  mrg implement counters for pages paged in/out
 1.4 07-Feb-1998  mrg restore rcsids
 1.3 07-Feb-1998  chs don't try to relock amap if there isn't one.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.11.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.17.2.3 02-Jun-1999  chs honor the new PG_RDONLY flag.
 1.17.2.2 25-Feb-1999  chs remove the hacky splhigh() around the pgo_fault() call.
thread_wakeup() -> wakeup().
use SLOCK_{,UN}LOCKED.
 1.17.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.27.2.2 18-Jun-1999  perry pullup 1.28->1.29 (chuq): fixes loss of process data under heavy paging bug
 1.27.2.1 16-Apr-1999  chs branches: 1.27.2.1.2; 1.27.2.1.4;
pull up 1.27 -> 1.28:
add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.27.2.1.4.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.27.2.1.2.5 02-Aug-1999  thorpej Update from trunk.
 1.27.2.1.2.4 02-Aug-1999  thorpej Update from trunk.
 1.27.2.1.2.3 04-Jul-1999  chs add PGO_SYNCIO to the flags to pgo_fault() and pgo_get() (unlocked).
this just makes things work out better in the handlers.
 1.27.2.1.2.2 21-Jun-1999  thorpej Sync w/ -current.
 1.27.2.1.2.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.45.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.45.4.1 15-Nov-1999  fvdl Sync with -current
 1.45.2.6 21-Apr-2001  bouyer Sync with HEAD
 1.45.2.5 27-Mar-2001  bouyer Sync with HEAD.
 1.45.2.4 12-Mar-2001  bouyer Sync with HEAD.
 1.45.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.45.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.45.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.48.4.2 16-Jun-2001  he Pull up revision 1.65 (via patch, requested by chuck):
Work around overflow problem in uvm_fault_wire().
 1.48.4.1 06-Aug-2000  thorpej Pull up rev. 1.51:
Update a comment in uvmfault_anonget() to reflect reality, and
make uvm_fault() handle uvmfault_anonget() failure properly (i.e.
don't unlock a lock that's already unlocked).
 1.56.2.13 30-Oct-2002  thorpej Sync with HEAD.
 1.56.2.12 17-Sep-2002  nathanw Catch up to -current.
 1.56.2.11 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.56.2.10 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.56.2.9 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.56.2.8 08-Jan-2002  nathanw Catch up to -current.
 1.56.2.7 14-Nov-2001  nathanw Catch up to -current.
 1.56.2.6 08-Oct-2001  nathanw Catch up to -current.
 1.56.2.5 21-Sep-2001  nathanw Catch up to -current.
 1.56.2.4 24-Aug-2001  nathanw Catch up with -current.
 1.56.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.56.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.56.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.67.4.2 11-Oct-2001  fvdl Catch up with -current. Fix some bogons in the sparc64 kbd/ms
attach code. cd18xx conversion provided by mrg.
 1.67.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.67.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.67.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.67.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.67.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.67.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.70.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.76.4.3 10-Dec-2002  jmc Pull up revisions 1.78-1.79 (requested by thorpej in ticket #952)
change uoff to voff_t from vaddr_t as it's offset within uvm object.
fix PR/18855.
 1.76.4.2 30-Nov-2002  he Pull up revision 1.78 (requested by thorpej in ticket #759):
When breaking a loan due to a page fault, check to see if
the other kind of reference-holder (anon or object) is
referencing the page. If not, the page must be removed
from the paging queue.
 1.76.4.1 30-Nov-2002  he Pull up revision 1.77 (requested by chs in ticket #770):
Be sure that the page we allocate to break a loan is put
on a paging queue. Fixes PR#18037.
 1.76.2.1 31-Aug-2002  gehenna catch up with -current.
 1.82.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.82.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.82.2.5 09-Feb-2005  skrll Sync with HEAD.
 1.82.2.4 17-Jan-2005  skrll Sync with HEAD.
 1.82.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.82.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.82.2.1 03-Aug-2004  skrll Sync with HEAD
 1.87.2.1 10-May-2004  tron branches: 1.87.2.1.2;
Pull up revision 1.88 (requested by yamt in ticket #271):
fix a amap_wirerange deadlock problem by re-introducing
PG_RELEASED for anon pages. PR/23171 from Christian Limpach.
for details, see discussion filed in the PR database.
uvm_anon_release: a new function to free anon-owned PG_RELEASED page.
uvm_anfree: we can't wait for the page here because the caller might hold
amap lock. instead, just mark the page as PG_RELEASED.
who unbusy the page should check the PG_RELEASED.
uvm_aio_aiodone: uvm_anon_release() instead of uvm_page_unbusy()
if appropriate.
uvmfault_anonget: check PG_RELEASED.
 1.87.2.1.2.1 11-May-2005  riz Pull up revision 1.90 (requested by dbj in ticket #1409):
uvm_fault: fix integer overflow so that MADV_SEQUENTIAL
can work on large files.
 1.89.4.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.89.4.1 12-Feb-2005  yamt sync with head.
 1.89.2.1 29-Apr-2005  kent sync with -current
 1.91.2.1 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.95.2.6 21-Jan-2008  yamt sync with head
 1.95.2.5 27-Oct-2007  yamt sync with head.
 1.95.2.4 03-Sep-2007  yamt sync with head.
 1.95.2.3 26-Feb-2007  yamt sync with head.
 1.95.2.2 30-Dec-2006  yamt sync with head.
 1.95.2.1 21-Jun-2006  yamt sync with head.
 1.103.2.3 01-Mar-2006  yamt sync with head.
 1.103.2.2 18-Feb-2006  yamt sync with head.
 1.103.2.1 01-Feb-2006  yamt sync with head.
 1.107.4.1 22-Apr-2006  simonb Sync with head.
 1.107.2.1 09-Sep-2006  rpaulo sync with head
 1.109.4.1 19-Apr-2006  elad oops - *really* sync to head this time.
 1.109.2.3 11-Apr-2006  yamt sync with head
 1.109.2.2 01-Apr-2006  yamt sync with head.
 1.109.2.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.110.2.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.111.8.2 12-Jan-2007  ad Sync with head.
 1.111.8.1 18-Nov-2006  ad Sync with head.
 1.112.2.3 18-Dec-2006  yamt sync with head.
 1.112.2.2 10-Dec-2006  yamt sync with head.
 1.112.2.1 22-Oct-2006  yamt sync with head
 1.117.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.119.12.1 15-Aug-2007  skrll Sync with HEAD.
 1.119.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.120.10.2 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.120.10.1 21-Jul-2007  ad file uvm_fault.c was added on branch matt-mips64 on 2007-07-21 19:21:55 +0000
 1.120.8.1 14-Oct-2007  yamt sync with head.
 1.120.6.3 23-Mar-2008  matt sync with HEAD
 1.120.6.2 09-Jan-2008  matt sync with HEAD
 1.120.6.1 06-Nov-2007  matt sync with HEAD
 1.120.4.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.121.10.2 19-Jan-2008  bouyer Sync with HEAD
 1.121.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.121.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.121.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.123.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.123.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.123.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.124.8.1 18-Jul-2008  simonb Sync with head.
 1.124.6.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.124.4.3 11-Aug-2010  yamt sync with head.
 1.124.4.2 11-Mar-2010  yamt sync with head
 1.124.4.1 04-May-2009  yamt sync with head.
 1.125.6.2 21-Nov-2010  riz Pull up following revision(s) (requested by rmind in ticket #1421):
sys/uvm/uvm_bio.c: revision 1.70
sys/uvm/uvm_map.c: revision 1.292
sys/uvm/uvm_pager.c: revision 1.98
sys/uvm/uvm_fault.c: revision 1.175
sys/uvm/uvm_bio.c: revision 1.69
ubc_fault: split-off code part handling a single page into ubc_fault_page().
Keep the lock around pmap_update() where required. While fixing this
in ubc_fault(), rework logic to &quot;remember&quot; the last object of page and
reduce locking overhead, since in common case pages belong to one and
the same UVM object (but not always, therefore add a comment).
Unlocks before pmap_update(), on removal of mappings, might cause TLB
coherency issues, since on architectures like x86 and mips64 invalidation
IPIs are deferred to pmap_update(). Hence, VA space might be globally
visible before IPIs are sent or while they are still in-flight.
OK ad@.
 1.125.6.1 02-Feb-2009  snj branches: 1.125.6.1.4;
Pull up following revision(s) (requested by ad in ticket #354):
sys/uvm/uvm_fault.c: revision 1.126
sys/uvm/uvm_map.c: revision 1.268
Move a couple of calls to pmap_update().
 1.125.6.1.4.6 12-Apr-2012  matt Apply colormask to get a valid color.
 1.125.6.1.4.5 12-Apr-2012  matt Separate object-less anon pages out of the active list if there is no swap
device. Make uvm_reclaimable and uvm.*estimatable understand colors and
kmem allocations.
 1.125.6.1.4.4 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.125.6.1.4.3 03-Jun-2011  matt Restore $NetBSD$
 1.125.6.1.4.2 25-May-2011  matt Make uvm_map recognize UVM_FLAG_COLORMATCH which tells uvm_map that the
'align' argument specifies the starting color of the KVA range to be returned.

When calling uvm_km_alloc with UVM_KMF_VAONLY, also specify the starting
color of the kva range returned (UMV_KMF_COLORMATCH) and pass those to
uvm_map.

In uvm_pglistalloc, make sure the pages being returned have sequentially
advancing colors (so they can be mapped in a contiguous address range).
Add a few missing UVM_FLAG_COLORMATCH flags to uvm_pagealloc calls.

Make the socket and pipe loan color-safe.

Make the mips pmap enforce strict page color (color(VA) == color(PA)).
 1.125.6.1.4.1 26-Jan-2010  matt Pass hints to uvm_pagealloc* to get it to use the right page color rather
than guess the right page color.
 1.125.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.166.2.27 21-Nov-2010  uebayasi Rename PGO_ZERO as PGO_HOLE, and s/uvm_page_zeropage/uvm_page_holepage/.
 1.166.2.26 21-Nov-2010  uebayasi UVMHIST log for XIP hole COW.
 1.166.2.25 21-Nov-2010  uebayasi Resurrect PGO_ZERO support.

When vnode pager encounters hole pages in XIP'ed vnodes, it fills
page slots with PGO_ZERO and returns them back to the caller (fault
handler). Fault handlers are responsible to check page slots and
redirect PGO_ZERO to the single "zero page" allocated by calling
uvm_page_zeropage_alloc(9).

The zero page is wired, read-only (PG_RDONLY) page. It's shared
by multiple vnodes, it has no single owner.

XIP'ed vnodes are supposed to be "stable" during I/O (unlocked).
Because XIP'ed mounts are always read-only. There's no chance to
change mappings of XIP'ed vnodes and their XIP'ed pages. Thus the
cached uobj is reused after pgo_get() for PGO_ZERO.

(Do we need a new concept of "read-only UVM object"?)
 1.166.2.24 19-Nov-2010  uebayasi Make XIP genfs_getpages_xip() return pages in I/O path, preparing
merge into the generic genfs_getpages().
 1.166.2.23 04-Nov-2010  uebayasi Split physical device segment pages from "managed" to "managed
device". Cache that information as a flag PG_DEVICE so that callers
don't need to walk physsegs everytime.

Remove PQ_FIXED, which means that page daemon doesn't need to know
device segment pages at all. But still fault handlers need to know
them.

I think this is what I can do best now.
 1.166.2.22 17-Aug-2010  uebayasi Sync with HEAD.
 1.166.2.21 12-Aug-2010  uebayasi Fix a #if/#ifdef miuse.
 1.166.2.20 22-Jul-2010  uebayasi s/PG_XIP/PQ_FIXED/, meaning that the fault handler sees XIP pages as
"fixed", and doesn't pass them to paging activity.

("XIP" is a vnode specific knowledge. It was wrong that the fault
handler had to know such a special thing.)
 1.166.2.19 15-Jul-2010  uebayasi Rename PG_DIRECT to PG_XIP. PG_XIP is marked to XIP vnode pages.
 1.166.2.18 14-Jul-2010  uebayasi One more XIP code reduction.
 1.166.2.17 13-Jul-2010  uebayasi Reduce more diffs from the original.
 1.166.2.16 12-Jul-2010  uebayasi Reduce more diff by backing out XIP page specific code. Allow XIP pages
to be loaned.
 1.166.2.15 12-Jul-2010  uebayasi Now XIP pages have vm_page, adjust some code and reduce diff to the
original code.
 1.166.2.14 09-Jul-2010  uebayasi Whitespace.
 1.166.2.13 09-Jul-2010  uebayasi Mark XIP pages as PG_CLEAN and/or PG_BUSY when appropriate. Protect
vnode lock when vm_page::flags is manipulated.
 1.166.2.12 08-Jul-2010  uebayasi Mark XIP pages as PG_RDONLY.
 1.166.2.11 08-Jul-2010  uebayasi Whitespace.
 1.166.2.10 07-Jul-2010  uebayasi Clean up; merge options DIRECT_PAGE into options XIP.
 1.166.2.9 07-Jul-2010  uebayasi To simplify things, revert global vm_page_md hash and allocate struct
vm_page [] for XIP physical segments.
 1.166.2.8 09-Jun-2010  uebayasi Fix build with DIAGNOSTIC.
 1.166.2.7 31-May-2010  uebayasi Re-define the definition of "device page"; device pages are pages of
device memory. Pages which don't have vm_page (== can't be used for
generic use), but whose PV are tracked, are called "direct pages" from
now.
 1.166.2.6 28-Feb-2010  uebayasi Put comments why device pages skip some code paths. Don't skip accounting
for "neighbor" device pages.
 1.166.2.5 24-Feb-2010  uebayasi Sync with HEAD.
 1.166.2.4 23-Feb-2010  uebayasi uvm_fault_lower_promote: One more missing part for device pages to by-pass
page cache handling. When a page in a uobj is promoted, its content is copied
to another owned by the newly allocated anon. The old page cache is then
disposed. Of course we don't need to dispose device pages in such a case,
so skip it.

Don't forget opt_device_page.h.

Count lower fault correctly.
 1.166.2.3 12-Feb-2010  uebayasi Teach device page handling to the "lower" fault handler. Skip all the paging
activities, no loaning, no wired count. Only compile tested so far.
 1.166.2.2 12-Feb-2010  uebayasi uvmfault_promote: For promotion from a "lower" page, pass the belonging struct
uvm_object * from callers, because device page struct vm_page * doesn't have
a back-pointer to the uvm_object.
 1.166.2.1 08-Feb-2010  uebayasi file uvm_fault.c was added on branch uebayasi-xip on 2010-02-12 16:06:50 +0000
 1.173.2.9 31-May-2011  rmind sync with head
 1.173.2.8 21-May-2011  rmind uvm_fault_lower_promote: fix assert (move a bit up, where logic applies).
 1.173.2.7 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.173.2.6 21-Apr-2011  rmind sync with head
 1.173.2.5 05-Mar-2011  rmind sync with head
 1.173.2.4 03-Jul-2010  rmind sync with head
 1.173.2.3 30-May-2010  rmind sync with head
 1.173.2.2 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.173.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.180.4.2 17-Feb-2011  bouyer Sync with HEAD
 1.180.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.180.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.185.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.190.2.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.190.2.5 17-Apr-2012  yamt sync with head
 1.190.2.4 28-Dec-2011  yamt - assertions
- __unused
 1.190.2.3 26-Dec-2011  yamt - use O->A loan to serve read(2). based on a patch from Chuck Silvers
- associated O->A loan fixes.
 1.190.2.2 14-Nov-2011  yamt assertions
 1.190.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.191.2.2 24-Feb-2012  mrg sync to -current.
 1.191.2.1 18-Feb-2012  mrg merge to -current.
 1.194.4.1 18-May-2014  rmind sync with head
 1.194.2.2 03-Dec-2017  jdolecek update from HEAD
 1.194.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.195.2.1 07-Apr-2014  tls Be a little more clear and consistent about harvesting entropy from devices:

1) deprecate RND_FLAG_NO_ESTIMATE

2) define RND_FLAG_COLLECT_TIME, RND_FLAG_COLLECT_VALUE

3) define RND_FLAG_ESTIMATE_TIME, RND_FLAG_ESTIMATE_VALUE

4) define RND_FLAG_DEFAULT: RND_FLAG_COLLECT_TIME|
RND_FLAG_COLLECT_VALUE|RND_FLAG_ESTIMATE_TIME

5) Make entropy harvesting from environmental sensors a little more generic
and remove it from individual sensor drivers.

6) Remove individual open-coded delta-estimators for values from a few
places in the tree (uvm, environmental drivers).

7) 0 -> RND_FLAG_DEFAULT, actually gather entropy from various drivers
that had stubbed out code, other minor cleanups.
 1.196.4.2 28-Aug-2017  skrll Sync with HEAD
 1.196.4.1 22-Sep-2015  skrll Sync with HEAD
 1.197.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.197.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.197.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.199.6.4 22-Apr-2019  martin Pull up following revision(s) (requested by chs in ticket #1236):

sys/uvm/uvm_fault.c: revision 1.205

If a pager fault method returns ENOMEM but some memory appears to be reclaimable,
wake up the pagedaemon and retry the fault. This fixes the problems with Xorg
being killed with an "out of swap" message due to a transient memory shortage.
 1.199.6.3 27-Feb-2018  martin Pull up following revision(s) (requested by mrg in ticket #593):
sys/dev/marvell/mvxpsec.c: revision 1.2
sys/arch/m68k/m68k/pmap_motorola.c: revision 1.70
sys/opencrypto/crypto.c: revision 1.102
sys/arch/sparc64/sparc64/pmap.c: revision 1.308
sys/ufs/chfs/chfs_malloc.c: revision 1.5
sys/arch/powerpc/oea/pmap.c: revision 1.95
sys/sys/pool.h: revision 1.80,1.82
sys/kern/subr_pool.c: revision 1.209-1.216,1.219-1.220
sys/arch/alpha/alpha/pmap.c: revision 1.262
sys/kern/uipc_mbuf.c: revision 1.173
sys/uvm/uvm_fault.c: revision 1.202
sys/sys/mbuf.h: revision 1.172
sys/kern/subr_extent.c: revision 1.86
sys/arch/x86/x86/pmap.c: revision 1.266 (via patch)
sys/dev/dtv/dtv_scatter.c: revision 1.4

Allow only one pending call to a pool's backing allocator at a time.
Candidate fix for problems with hanging after kva fragmentation related
to PR kern/45718.

Proposed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html
Tested by bouyer@ on i386.

This makes one small change to the semantics of pool_prime and
pool_setlowat: they may fail with EWOULDBLOCK instead of ENOMEM, if
there is a pending call to the backing allocator in another thread but
we are not actually out of memory. That is unlikely because nearly
always these are used during initialization, when the pool is not in
use.

Define the new flag too for previous commit.

pool_grow can now fail even when sleeping is ok. Catch this case in pool_get
and retry.

Assert that pool_get failure happens only with PR_NOWAIT.
This would have caught the mistake I made last week leading to null
pointer dereferences all over the place, a mistake which I evidently
poorly scheduled alongside maxv's change to the panic message on x86
for null pointer dereferences.

Since pr_lock is now used to wait for two things now (PR_GROWING and
PR_WANTED) we need to loop for the condition we wanted.
make the KASSERTMSG/panic strings consistent as '%s: [%s], __func__, wchan'
Handle the ERESTART case from pool_grow()

don't pass 0 to the pool flags
Guess pool_cache_get(pc, 0) means PR_WAITOK here.
Earlier on in the same context we use kmem_alloc(sz, KM_SLEEP).

use PR_WAITOK everywhere.
use PR_NOWAIT.

Don't use 0 for PR_NOWAIT

use PR_NOWAIT instead of 0

panic ex nihilo -- PR_NOWAITing for zerot

Add assertions that either PR_WAITOK or PR_NOWAIT are set.
- fix an assert; we can reach there if we are nowait or limitfail.
- when priming the pool and failing with ERESTART, don't decrement the number
of pages; this avoids the issue of returning an ERESTART when we get to 0,
and is more correct.
- simplify the pool_grow code, and don't wakeup things if we ENOMEM.

In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. This implements the requirement that
pmap_enter(PMAP_CANFAIL) must not fail when replacing an existing
mapping with the first mapping of a new page, which is an unintended
consequence of the changes from the rmind-uvmplock branch in 2011.

The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.

This somewhat indirectly fixes PR 52706, as well as the failing assertion
about "uvm_page_locked_p(old_pg)". (but only on x86, various other platforms
will need their own changes to handle this issue.)
In uvm_fault_upper_enter(), if pmap_enter(PMAP_CANFAIL) fails, assert that
the pmap did not leave around a now-stale pmap mapping for an old page.
If such a pmap mapping still existed after we unlocked the vm_map,
the UVM code would not know later that it would need to lock the
lower layer object while calling the pmap to remove or replace that
stale pmap mapping. See PR 52706 for further details.
hopefully workaround the irregularly "fork fails in init" problem.
if a pool is growing, and the grower is PR_NOWAIT, mark this.
if another caller wants to grow the pool and is also PR_NOWAIT,
busy-wait for the original caller, which should either succeed
or hard-fail fairly quickly.

implement the busy-wait by unlocking and relocking this pools
mutex and returning ERESTART. other methods (such as having
the caller do this) were significantly more code and this hack
is fairly localised.
ok chs@ riastradh@

Don't release the lock in the PR_NOWAIT allocation. Move flags setting
after the acquiring the mutex. (from Tobias Nygren)
apply the change from arch/x86/x86/pmap.c rev. 1.266 commitid vZRjvmxG7YTHLOfA:

In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. If we are replacing an existing mapping,
reuse the pv structure where possible.

This implements the requirement that pmap_enter(PMAP_CANFAIL) must not fail
when replacing an existing mapping with the first mapping of a new page,
which is an unintended consequence of the changes from the rmind-uvmplock
branch in 2011.

The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.

This somewhat indirectly fixes PR 52706 on the remaining platforms where
this problem existed.
 1.199.6.2 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.199.6.1 24-Jul-2017  snj Pull up following revision(s) (requested by kamil in ticket #120):
sys/uvm/uvm_fault.c: revision 1.200
tests/lib/libc/sys/t_write.c: revision 1.4-1.6
PR/52384: make uvm_fault_check() return EFAULT not EACCES, like our man
pages
(but not OpenGroup which does not document EFAULT for read/write, and onl=
y
documents EACCES for sockets) say for read/write.
--
check for EFAULT on reads and writes to memory with not permission.
--
add munmap
#define for const.
--
add another missing munmap (Kamil)
 1.202.2.1 21-May-2018  pgoyette Sync with HEAD
 1.204.2.4 21-Apr-2020  martin Sync with HEAD
 1.204.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.204.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.204.2.1 10-Jun-2019  christos Sync with HEAD
 1.206.2.3 15-Aug-2023  martin Pull up following revision(s) (requested by chs in ticket #1714):

sys/uvm/uvm_fault.c: revision 1.234

uvm: prevent TLB invalidation races during COW resolution

When a thread takes a page fault which results in COW resolution,
other threads in the same process can be concurrently accessing that
same mapping on other CPUs. When the faulting thread updates the pmap
entry at the end of COW processing, the resulting TLB invalidations to
other CPUs are not done atomically, so another thread can write to the
new writable page and then a third thread might still read from the
old read-only page, resulting in inconsistent views of the page by the
latter two threads. Fix this by removing the pmap entry entirely for
the original page before we install the new pmap entry for the new
page, so that the new page can only be modified after the old page is
no longer accessible.

This fixes PR 56535 as well as the netbsd versions of problems
described in various bug trackers:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225584
https://reviews.freebsd.org/D14347
https://github.com/golang/go/issues/34988
 1.206.2.2 08-Mar-2020  martin Pull up following revision(s) (requested by chs in ticket #764):

sys/uvm/uvm_fault.c: revision 1.207

fix two bugs reported in
https://syzkaller.appspot.com/bug?id=8840dce484094a926e1ec388ffb83acb2fa291c9

- in uvm_fault_check(), if the map entry is wired, handle the fault the same way
that we would handle UVM_FAULT_WIRE. faulting on wired mappings is valid
if the mapped object was truncated and then later grown again.

- in uvm_fault_unwire_locked(), we must hold the locks for the vm_map_entry
while calling pmap_extract() in order to avoid races with the mapped object
being truncated while we are unwiring it.
 1.206.2.1 11-Nov-2019  martin Pull up following revision(s) (requested by chs in ticket #414):

sys/uvm/uvm_fault.c: revision 1.208

in uvm_fault_lower_io(), fetch all the map entry values that we need
before we unlock everything.
 1.214.2.2 29-Feb-2020  ad Sync with head.
 1.214.2.1 17-Jan-2020  ad Sync with head.
 1.224.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.231.2.1 15-Aug-2023  martin Pull up following revision(s) (requested by chs in ticket #327):

sys/uvm/uvm_fault.c: revision 1.234

uvm: prevent TLB invalidation races during COW resolution

When a thread takes a page fault which results in COW resolution,
other threads in the same process can be concurrently accessing that
same mapping on other CPUs. When the faulting thread updates the pmap
entry at the end of COW processing, the resulting TLB invalidations to
other CPUs are not done atomically, so another thread can write to the
new writable page and then a third thread might still read from the
old read-only page, resulting in inconsistent views of the page by the
latter two threads. Fix this by removing the pmap entry entirely for
the original page before we install the new pmap entry for the new
page, so that the new page can only be modified after the old page is
no longer accessible.

This fixes PR 56535 as well as the netbsd versions of problems
described in various bug trackers:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225584
https://reviews.freebsd.org/D14347
https://github.com/golang/go/issues/34988
 1.20 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.19 15-Mar-2006  drochner branches: 1.19.88; 1.19.94; 1.19.96;
-clean up the interface to uvm_fault: the "fault type" didn't serve
any purpose (done by a macro, so we don't save any cycles for now)
-kill vm_fault_t; it is not needed for real faults, and for simulated
faults (wiring) it can be replaced by UVM internal flags
-remove <uvm/uvm_fault.h> from uvm_extern.h again
 1.18 11-Dec-2005  christos branches: 1.18.4; 1.18.6; 1.18.8; 1.18.10;
merge ktrace-lwp.
 1.17 24-Mar-2004  junyoung branches: 1.17.16;
Nuke __P().
 1.16 31-Dec-2001  chs branches: 1.16.16;
introduce a new UVM fault type, VM_FAULT_WIREMAX. this is different
from VM_FAULT_WIRE in that when the pages being wired are faulted in,
the simulated fault is at the maximum protection allowed for the mapping
instead of the current protection. use this in uvm_map_pageable{,_all}()
to fix the problem where writing via ptrace() to shared libraries that
are also mapped with wired mappings in another process causes a
diagnostic panic when the wired mapping is removed.

this is a really obscure problem so it deserves some more explanation.
ptrace() writing to another process ends up down in uvm_map_extract(),
which for MAP_PRIVATE mappings (such as shared libraries) will cause
the amap to be copied or created. then the amap is made shared
(ie. the AMAP_SHARED flag is set) between the kernel and the ptrace()d
process so that the kernel can modify pages in the amap and have the
ptrace()d process see the changes. then when the page being modified
is actually faulted on, the object pages (from the shared library vnode)
is copied to a new anon page and inserted into the shared amap.
to make all the processes sharing the amap actually see the new anon
page instead of the vnode page that was there before, we need to
invalidate all the pmap-level mappings of the vnode page in the pmaps
of the processes sharing the amap, but we don't have a good way of
doing this. the amap doesn't keep track of the vm_maps which map it.
so all we can do at this point is to remove all the mappings of the
page with pmap_page_protect(), but this has the unfortunate side-effect
of removing wired mappings as well. removing wired mappings with
pmap_page_protect() is a legitimate operation, it can happen when a file
with a wired mapping is truncated. so the pmap has no way of knowing
whether a request to remove a wired mapping is normal or when it's due to
this weird situation. so the pmap has to remove the weird mapping.
the process being ptrace()d goes away and life continues. then,
much later when we go to unwire or remove the wired vm_map mapping,
we discover that the pmap mapping has been removed when it should
still be there, and we panic.

so where did we go wrong? the problem is that we don't have any way
to update just the pmap mappings that need to be updated in this
scenario. we could invent a mechanism to do this, but that is much
more complicated than this change and it doesn't seem like the right
way to go in the long run either.

the real underlying problem here is that wired pmap mappings just
aren't a good concept. one of the original properties of the pmap
design was supposed to be that all the information in the pmap could
be thrown away at any time and the VM system could regenerate it all
through fault processing, but wired pmap mappings don't allow that.
a better design for UVM would not require wired pmap mappings,
and Chuck C. and I are talking about this, but it won't be done
anytime soon, so this change will do for now.

this change has the effect of causing MAP_PRIVATE mappings to be
copied to anonymous memory when they are mlock()d, so that uvm_fault()
doesn't need to copy these pages later when called from ptrace(), thus
avoiding the call to pmap_page_protect() and the panic that results
from this when the mlock()d region is unlocked or freed. note that
this change doesn't help the case where the wired mapping is MAP_SHARED.

discussed at great length with Chuck Cranor.
fixes PRs 10363, 12554, 12604, 13041, 13487, 14580 and 14853.
 1.15 02-Jun-2001  chs branches: 1.15.2;
replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.14 26-Jun-2000  mrg branches: 1.14.2;
remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.13 21-Jun-1999  thorpej branches: 1.13.2;
Protect prototypes, certain macros, and inlines from userland.
 1.12 16-Jun-1999  thorpej * Rename uvm_fault_unwire() to uvm_fault_unwire_locked(), and require that
the map be at least read-locked to call this function. This requirement
will be taken advantage of in a future commit.
* Write a uvm_fault_unwire() wrapper which read-locks the map and calls
uvm_fault_unwire_locked().
* Update the comments describing the locking contraints of uvm_fault_wire()
and uvm_fault_unwire().
 1.11 04-Jun-1999  thorpej Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.
 1.10 28-May-1999  thorpej Make uvm_fault_unwire() take a vm_map_t, rather than a pmap_t, for
consistency. Use this opportunity for checking for intrsafe map use
in this routine (which is illegal).
 1.9 26-May-1999  thorpej Pass an access_type to uvm_fault_wire(), which it forwards on to
uvm_fault().
 1.8 25-Mar-1999  mrg branches: 1.8.4;
remove now >1 year old pre-release message.
 1.7 11-Oct-1998  chuck remove unused share map code from UVM:
- simplify uvm_faultinfo in uvm_fault.h (parent map tracking no longer needed)
- adjust locking and lookup functions in uvm_fault_i.h to reflect the above
- replace ufi.rvaddr with ufi.orig_rvaddr in uvm_fault.c since rvaddr is
no longer needed.
- no need to worry about share map translations in uvm_fault(). simplify.
 1.6 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.5 09-Mar-1998  mrg branches: 1.5.2;
KNF.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.5.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.8.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.8.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.13.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.2.2 08-Jan-2002  nathanw Catch up to -current.
 1.14.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.15.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.16.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.16.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.16.16.1 03-Aug-2004  skrll Sync with HEAD
 1.17.16.1 21-Jun-2006  yamt sync with head.
 1.18.10.1 19-Apr-2006  elad oops - *really* sync to head this time.
 1.18.8.1 01-Apr-2006  yamt sync with head.
 1.18.6.1 22-Apr-2006  simonb Sync with head.
 1.18.4.1 09-Sep-2006  rpaulo sync with head
 1.19.96.1 08-Feb-2011  bouyer Sync with HEAD
 1.19.94.1 06-Jun-2011  jruoho Sync with HEAD.
 1.19.88.1 05-Mar-2011  rmind sync with head
 1.33 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.32 16-Dec-2019  ad branches: 1.32.2;
- Extend the per-CPU counters matt@ did to include all of the hot counters
in UVM, excluding uvmexp.free, which needs special treatment and will be
done with a separate commit. Cuts system time for a build by 20-25% on
a 48 CPU machine w/DIAGNOSTIC.

- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
 1.31 08-May-2018  christos branches: 1.31.2;
don't store the rssmax in the lwp rusage, it is a per proc property. Instead
utilize an unused field in the vmspace struct to store it. Also conditionalize
on platforms that have pmap statistics available.
 1.30 07-May-2018  christos update maxrss (used to always be 0). Patterned after the OpenBSD changes.
 1.29 19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.28 19-Feb-2012  rmind branches: 1.28.38;
Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".
 1.27 12-Jun-2011  rmind branches: 1.27.2; 1.27.6;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.26 02-Feb-2011  chuck branches: 1.26.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.25 06-Feb-2010  uebayasi branches: 1.25.4; 1.25.6; 1.25.8;
__inline -> inline
 1.24 02-Jan-2008  ad branches: 1.24.10;
Merge vmlocking2 to head.
 1.23 22-Feb-2007  thorpej branches: 1.23.4; 1.23.18; 1.23.24; 1.23.26; 1.23.30;
TRUE -> true, FALSE -> false
 1.22 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.21 16-Feb-2006  perry branches: 1.21.20;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.20 24-Dec-2005  perry branches: 1.20.2; 1.20.4; 1.20.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.19 11-Dec-2005  christos merge ktrace-lwp.
 1.18 27-Jun-2005  thorpej branches: 1.18.2;
Use ANSI function decls.
 1.17 24-Mar-2004  junyoung Nuke __P().
 1.16 02-Nov-2002  perry branches: 1.16.6;
/*CONSTCOND*/
 1.15 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.14 26-Jun-2001  thorpej branches: 1.14.2; 1.14.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.
 1.13 02-Jun-2001  chs replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.12 25-May-2001  chs remove trailing whitespace.
 1.11 26-Jun-2000  mrg branches: 1.11.2;
remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.10 11-Jan-2000  chs add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.9 04-Jun-1999  thorpej branches: 1.9.2;
Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.
 1.8 25-Mar-1999  mrg branches: 1.8.4;
remove now >1 year old pre-release message.
 1.7 24-Jan-1999  chuck cleanup/reorg:
- break anon related functions out of uvm_amap.c and put them in their own
file (uvm_anon.c). includes break up uvm_anon_init into an amap and an
an anon init function
- ensure that only functions within the amap module access amap structure
fields (add macros to amap api as needed)
 1.6 11-Oct-1998  chuck remove unused share map code from UVM:
- simplify uvm_faultinfo in uvm_fault.h (parent map tracking no longer needed)
- adjust locking and lookup functions in uvm_fault_i.h to reflect the above
- replace ufi.rvaddr with ufi.orig_rvaddr in uvm_fault.c since rvaddr is
no longer needed.
- no need to worry about share map translations in uvm_fault(). simplify.
 1.5 09-Mar-1998  mrg KNF.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.8.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.9.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.2.4 11-Nov-2002  nathanw Catch up to -current
 1.11.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.11.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.11.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.14.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.14.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.16.6.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.16.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.16.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.16.6.1 03-Aug-2004  skrll Sync with HEAD
 1.18.2.2 21-Jan-2008  yamt sync with head
 1.18.2.1 26-Feb-2007  yamt sync with head.
 1.20.6.1 22-Apr-2006  simonb Sync with head.
 1.20.4.1 09-Sep-2006  rpaulo sync with head
 1.20.2.1 18-Feb-2006  yamt sync with head.
 1.21.20.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.23.30.1 02-Jan-2008  bouyer Sync with HEAD
 1.23.26.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.23.24.1 18-Feb-2008  mjf Sync with HEAD.
 1.23.18.1 09-Jan-2008  matt sync with HEAD
 1.23.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.24.10.1 11-Mar-2010  yamt sync with head
 1.25.8.1 08-Feb-2011  bouyer Sync with HEAD
 1.25.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.25.4.3 05-Mar-2011  rmind sync with head
 1.25.4.2 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.25.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.26.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.27.6.1 24-Feb-2012  mrg sync to -current.
 1.27.2.1 17-Apr-2012  yamt sync with head
 1.28.38.2 21-May-2018  pgoyette Sync with HEAD
 1.28.38.1 22-Apr-2018  pgoyette Sync with HEAD
 1.31.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.32.2.1 29-Feb-2020  ad Sync with head.
 1.182 04-Oct-2023  ad Remove unneeded test of ci->ci_want_resched.
 1.181 14-Jun-2020  ad Remove PG_ZERO. It worked brilliantly on x86 machines from the mid-90s but
having spent an age experimenting with it over the last 6 months on various
machines and with different use cases it's always either break-even or a
slight net loss for me.
 1.180 11-Jun-2020  ad uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
 1.179 22-May-2020  ad Remove the ubc_direct hack.
 1.178 23-Apr-2020  ad Enable ubc_direct by default, but only on systems with no more than 2 CPUs
for now.
 1.177 05-Mar-2020  rin branches: 1.177.2;
Part of PR kern/54994:

Memory allocated in the fast path of uarea_poolpage_alloc() is
a page itself. Therefore, it is obviously page-aligned.

Pointed out by skrll.
 1.176 12-Jan-2020  ad l->l_emap_gen isn't used any more.
 1.175 31-Dec-2019  ad branches: 1.175.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.174 31-Dec-2019  ad Rename uvm_free() -> uvm_availmem().
 1.173 27-Dec-2019  ad Redo the page allocator to perform better, especially on multi-core and
multi-socket systems. Proposed on tech-kern. While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
 1.172 21-Dec-2019  ad uvmexp.free -> uvm_free()
 1.171 16-Dec-2019  ad - Extend the per-CPU counters matt@ did to include all of the hot counters
in UVM, excluding uvmexp.free, which needs special treatment and will be
done with a separate commit. Cuts system time for a build by 20-25% on
a 48 CPU machine w/DIAGNOSTIC.

- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
 1.170 21-Nov-2019  ad Use lwp_changepri().
 1.169 14-Nov-2019  maxv Don't include "opt_kasan.h" when there's already <sys/asan.h> included.
 1.168 08-May-2019  chs uvm_pagealloc() uses UVM_PGA_* flags, not UVM_KMF_* flags,
and it is always nowait. fix uarea_poolpage_alloc() to not use
flags from the wrong collection for calling uvm_pagealloc()
and to wait itself if a page is not immediately available.
 1.167 07-Apr-2019  maxv Provide a code argument in kasan_mark(), and give a code to each caller.
Five codes used: GenericRedZone, MallocRedZone, KmemRedZone, PoolRedZone,
and PoolUseAfterFree.

This can greatly help debugging complex memory corruptions.
 1.166 23-Dec-2018  maxv Simplify the KASAN API, use only kasan_mark() and explain briefly. The
alloc/free naming was too confusing.
 1.165 04-Nov-2018  mlelstv PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.
 1.164 22-Aug-2018  maxv Add support for monitoring the stack with kASan. This allows us to detect
illegal memory accesses occuring there.

The compiler inlines a piece of code in each function that adds redzones
around the local variables and poisons them. The illegal accesses are then
detected using the usual kASan machinery.

The stack size is doubled, from 4 pages to 8 pages.

Several boot functions are marked with the __noasan flag, to prevent the
compiler from adding redzones in them (because we haven't yet initialized
kASan). The kasan_early_init function is called early at boot time to
quickly create the shadow for the current stack; after this is done, we
don't need __noasan anymore in the boot path.

We pass -fasan-shadow-offset=0xDFFF900000000000, because the compiler
wants to do
shad = shadow-offset + (addr >> 3)
and we do, in kasan_addr_to_shad
shad = KASAN_SHADOW_START + ((addr - CANONICAL_BASE) >> 3)
hence
shad = KASAN_SHADOW_START + (addr >> 3) - (CANONICAL_BASE >> 3)
= [KASAN_SHADOW_START - (CANONICAL_BASE >> 3)] + (addr >> 3)
implies
shadow-offset = KASAN_SHADOW_START - (CANONICAL_BASE >> 3)
= 0xFFFF800000000000 - (0xFFFF800000000000 >> 3)
= 0xDFFF900000000000

In UVM, we add a kasan_free (that is not preceded by a kasan_alloc). We
don't add poisoned redzones ourselves, but all the functions we execute
do, so we need to manually clear the poison before freeing the stack.

With the help of Kamil for the makefile stuff.
 1.163 22-May-2016  maxv branches: 1.163.16; 1.163.18;
Revert my previous change. I missed an entry on NXR.
 1.162 21-May-2016  maxv USPACE and USPACE_ALIGN are constants. Use a #if instead. Probably saves
some instructions.
 1.161 27-Nov-2014  uebayasi branches: 1.161.2;
Consistently use kpreempt_*() outside scheduler path.
 1.160 01-Sep-2012  matt branches: 1.160.2;
Add a __HAVE_CPU_UAREA_IDLELWP hook so that the MD code can allocate
special UAREAs for idle lwp's.
 1.159 08-Apr-2012  martin Rework posix_spawn locking and memory management:
- always provide a vmspace for the new proc, initially borrowing from proc0
(this part fixes PR 46286)
- increase parallelism between parent and child if arguments allow this,
avoiding a potential deadlock on exec_lock
- add a new flag for userland to request old (lockstepped) behaviour for
better error reporting
- adapt test cases to the previous two and add a new variant to test the
diagnostics flag
- fix a few memory (and lock) leaks
- provide netbsd32 compat
 1.158 06-Apr-2012  chs fix uarea_system_poolpage_free() to handle freeing a uarea
that was not allocated by cpu_uarea_alloc() (ie. on plaforms
where cpu_uarea_alloc() failing is not fatal).
fixes PR 46284.
 1.157 20-Feb-2012  martin Solve previous fix (for early posix_spawn children exiting on error)
differently.
 1.156 12-Feb-2012  martin branches: 1.156.2;
In uvm_proc_exit bail out early if we have no vmspace yet (as it happens
for failing posix_spawn child processes).
Fixes PR kern/45991.
 1.155 11-Feb-2012  martin Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.
 1.154 01-Feb-2012  para allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space
 1.153 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.152 23-Nov-2011  matt branches: 1.152.2;
When allocating a page for a kernel stack and PMAP_ALLOC_POOLPAGE is
defined, use it. (allows a MIPS N32 kernel to boot when there is memory
outside of KSEG0).
 1.151 02-Jul-2011  matt branches: 1.151.2;
Allow the MD code to decide to panic if cpu_uarea_alloc would return NULL.
If NULL is returned, just allocate the standard way.
 1.150 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.149 18-Feb-2011  drochner branches: 1.149.2;
make this build w/o HAVE_CPU_UAREA_ROUTINES
 1.148 17-Feb-2011  matt Add support for cpu-specific uarea allocation routines. Allows different
allocation for user and system lwps. MIPS will use this to map uareas of
system lwp used direct-mapped addresses (to reduce the overhead of
switching to kernel threads). ibm4xx could use to map uareas via direct
mapped addresses and avoid the problem of having the kernel stack not in
the TLB.
 1.147 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.
 1.146 14-Jan-2011  rmind branches: 1.146.2; 1.146.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.
 1.145 16-Apr-2010  rmind - Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.
 1.144 25-Feb-2010  jym branches: 1.144.2;
Change RSS (resident set size) limit. Instead of setting it arbitrarily
to the total free memory available to the system, use the smallest value
between VM_MAXUSER_ADDRESS and total free memory (having a RSS limit
bigger than VM_MAXUSER_ADDRESS has no real meaning).

Fix a possible int overflow when ptoa(uvmexp.free) is bigger than 4GB
with a 32 bits vaddr_t.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/24/msg007395.html
 1.143 17-Dec-2009  rmind branches: 1.143.2;
Replace few USER_TO_UAREA/UAREA_TO_USER uses, reduce sys/user.h inclusions.
 1.142 21-Nov-2009  rmind Add uvm_lwp_getuarea() and uvm_lwp_setuarea(). OK matt@.
 1.141 21-Oct-2009  rmind Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.140 10-Aug-2009  matt Revent change to printf. (why can't __func__ concat with other string?)
 1.139 09-Aug-2009  matt Only swapout uareas if VMSWAP_UAREA is defined (which is should be by default).
If it's not defined and PMAP_MAP_POOLPAGE is defined and USPACE == PAGE_SIZE,
then allocate/map USPACE via uvm_pagealloc/PMAP_MAP_POOLPAGE.

On platforms like MIPS with 16KB pages, this means that uareas (and hence lwp
kernel stacks) will be always be accessible since they will be KSEG0.
 1.138 28-Jun-2009  rmind Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
 1.137 16-Apr-2009  rmind Avoid few #ifdef KSTACK_CHECK_MAGIC.
 1.136 29-Mar-2009  mrg - add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
 1.135 31-Jan-2009  yamt branches: 1.135.2;
uvm_swapin: uncomment an assertion which is now ok.
 1.134 19-Nov-2008  ad Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime
 1.133 25-Jun-2008  ad branches: 1.133.2; 1.133.4; 1.133.6;
Don't swap kernel stacks of realtime threads.
 1.132 16-Jun-2008  ad uvm_swapout: try to lock the vm_map before calling pmap_collect.
 1.131 09-Jun-2008  ad branches: 1.131.2;
swappable: invert previous so we check for SACTIVE or SSTOP.
 1.130 09-Jun-2008  ad swappable: return false if l->l_proc->p_stat == SDYING.
 1.129 09-Jun-2008  ad uvm_proc_exit: use macros to disable preemption.
 1.128 04-Jun-2008  ad - vm_page: put listq, pageq into a union alongside a LIST_ENTRY, so we can
use both types of list.

- Make page coloring and idle zero state per-CPU.

- Maintain per-CPU page freelists. When freeing, put pages onto the local
CPU's lists and the global lists. When allocating, prefer to take pages
from the local CPU. If none are available take from the global list as
done now. Proposed on tech-kern@.
 1.127 31-May-2008  ad PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.
 1.126 27-Apr-2008  ad branches: 1.126.2; 1.126.4;
Disable preemption while swapping pmap.
 1.125 24-Apr-2008  ad Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.124 11-Apr-2008  yamt branches: 1.124.2;
fix the order of printf arguments.
 1.123 11-Apr-2008  christos - use uarea_swapin, rather than duplicating the code.
- use __func__ where appropriate.
 1.122 29-Mar-2008  christos make this compile
 1.121 29-Mar-2008  dholland Fix broken build. hi skrll :-)
 1.120 29-Mar-2008  skrll Fix unsed variable when DEBUG isn't defined.
 1.119 27-Mar-2008  ad Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.118 29-Feb-2008  yamt update comment
 1.117 08-Feb-2008  yamt branches: 1.117.2; 1.117.6;
uvm_uarea_init: fix compilation where PAGE_SIZE is not a constant. (sparc)
reported by Tom Spindler.
 1.116 07-Feb-2008  yamt uvm_uarea_init: make #if about PR_NOALIGN clearer and add a comment
to explain it.
 1.115 28-Jan-2008  yamt remove a special allocator for uareas, which is no longer necessary.
use pool_cache instead.
 1.114 02-Jan-2008  ad Merge vmlocking2 to head.
 1.113 06-Nov-2007  ad branches: 1.113.2; 1.113.6;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.
 1.112 21-Sep-2007  ad branches: 1.112.4; 1.112.6;
uvm_swapin: disable the swaplock assertion. uvm_lwp_hold() can't take
the lock yet.
 1.111 18-Aug-2007  ad branches: 1.111.2;
Include sys/cpu.h for CPU_INFO_FOREACH.
 1.110 18-Aug-2007  ad Fix error in previous.
 1.109 18-Aug-2007  ad Make the uarea cache per-CPU and drain in batches of 4.
 1.108 14-Jul-2007  ad branches: 1.108.2; 1.108.6;
Revert unintentially committed change.
 1.107 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.106 17-May-2007  yamt merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
 1.105 24-Mar-2007  rmind Export uvm_uarea_free() to the rest.
Make things compile again.
 1.104 04-Mar-2007  christos branches: 1.104.2; 1.104.4; 1.104.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.103 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.102 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.101 19-Feb-2007  ad uvm_kick_scheduler(): do nothing until the swap subsystem is initialized.
 1.100 17-Feb-2007  pavel Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.
 1.99 15-Feb-2007  ad branches: 1.99.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).
 1.98 09-Feb-2007  ad Merge newlock2 to head.
 1.97 05-Oct-2006  chs add support for O_DIRECT (I/O directly to application memory,
bypassing any kernel caching for file data).
 1.96 29-Aug-2006  matt branches: 1.96.2; 1.96.4;
Make PTRACE and COREDUMP optional. Make the default (status quo) by putting
them in conf/std.
 1.95 13-Jun-2006  yamt uvm_swapin: process -> lwp in a comment.
 1.94 22-May-2006  yamt introduce macros, UAREA_TO_USER and USER_TO_UAREA,
to convert uarea VA into a pointer to struct user and vice versa,
so that MD code can change the layout in uarea.
 1.93 15-Mar-2006  drochner branches: 1.93.2; 1.93.4;
-clean up the interface to uvm_fault: the "fault type" didn't serve
any purpose (done by a macro, so we don't save any cycles for now)
-kill vm_fault_t; it is not needed for real faults, and for simulated
faults (wiring) it can be replaced by UVM internal flags
-remove <uvm/uvm_fault.h> from uvm_extern.h again
 1.92 24-Dec-2005  perry branches: 1.92.4; 1.92.6; 1.92.8; 1.92.10;
__inline__ -> inline
 1.91 11-Dec-2005  christos merge ktrace-lwp.
 1.90 24-Oct-2005  chs remove the assertion in uvm_swapout_threads() about LSONPROC lwps
not running on the same CPU as the swapper. l_stat is protected by
sched_lock, which isn't held here, so we can race with that lwp
starting to run and see its l_cpu not updated yet, as in PR 31870.
we check l_stat again in uvm_swapout() while holding sched_lock,
so the race itself is harmless.
 1.89 27-Jun-2005  thorpej branches: 1.89.2; 1.89.4;
Use ANSI function decls.
 1.88 10-Jun-2005  matt Rework the coredump code to have no explicit knownledge of how coredump
i/o is done. Instead, pass an opaque cookie which is then passed to a
new routine, coredump_write, which does the actual i/o. This allows the
method of doing i/o to change without affecting any future MD code.
Also, make netbsd32_core.c [re]use core_netbsd.c (in a similar manner that
core_elf64.c uses core_elf32.c) and eliminate that code duplication.
cpu_coredump{,32} is now called twice, first with a NULL iocookie to fill
the core structure and a second to actually write md parts of the coredump.
All i/o is nolonger random access and is suitable for shipping over a stream.
 1.87 07-Jun-2005  matt Make sure state.end has a valid initial value.
 1.86 02-Jun-2005  matt When writing coredumps, don't write zero uninstantiated demand-zero pages.
Also, with ELF core dumps, trim trailing zeroes from sections. These two
changes can shrink coredumps by over 50% in size.
 1.85 06-May-2005  nathanw uvm_coredump_walkmap(): Set UVM_COREDUMP_NODUMP on regions whose
protection does not include VM_PROT_READ, so that the core dumping
doesn't error out with EFAULT when trying to write that region.

Addresses PR kern/30143; approach suggested by chs@.
 1.84 01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.83 08-Feb-2005  yamt branches: 1.83.4;
update a comment; malloc doesn't use uvm_kernacc anymore.
 1.82 21-Jan-2005  chs branches: 1.82.2;
reduce the size of user coredump files by not dumping regions of
the address space that have never been touched (such as much of the
virtual space allocated for pthread stacks).
 1.81 12-May-2004  yamt branches: 1.81.4;
add assertions.
 1.80 02-May-2004  pk Make uvm_uarea_free an inline function.
 1.79 04-Apr-2004  pk Use maxdmap and maxsmap instead of MAXDSIZ and MAXSSIZ.
 1.78 24-Mar-2004  junyoung branches: 1.78.2; 1.78.4; 1.78.6;
- Nuke __P().
- Drop trailing spaces.
 1.77 09-Feb-2004  yamt - borrow vmspace0 in uvm_proc_exit instead of uvmspace_free.
the latter is not a appropriate place to do so and it broke vfork.
- deactivate pmap before calling cpu_exit() to keep a balance of
pmap_activate/deactivate.
 1.76 16-Jan-2004  yamt uvm_coredump_walkmap: use UVM_OBJ_IS_DEVICE macro.
 1.75 04-Jan-2004  jdolecek Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread
 1.74 30-Dec-2003  pk Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
 1.73 13-Nov-2003  chs eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.
 1.72 03-Nov-2003  yamt revert rev.1.70 as it was not needed.
uvm_map_lookup_entry() should handle addresses out of the map.
 1.71 02-Nov-2003  jdolecek kill unneded SYSVSHM includes
use ANSI C function definition for uvm_lwp_exit()
 1.70 01-Nov-2003  yamt don't try to lookup addresses out of the map in uvm_coredump_walkmap().
 1.69 24-Oct-2003  cl simplify tests:
The case where l_stat == LSONPROC and l_cpu == curcpu cannot happen
because the pagedaemon is the LWP on curcpu and the pagedaemon is a
kernel thread and the code is only used by the pagedaemon.

See also updated patch in PR kern/23095, which I ment to checkin
originally.
 1.68 19-Oct-2003  cl don't uvm_swapout LWPs which are LSONPROC on another cpu.

uvm_swapout_threads will swapout LWPs which are running on another CPU:
- uvm_swapout_threads considers LWPs running on another CPU for swapout
if their l_swtime is high
- uvm_swapout_threads considers LWPs on the runqueue for swapout if their
l_swtime is high but these LWPs might be running by the time uvm_swapout
is called

symptoms of failure: panic in setrunqueue

fixes PR kern/23095
 1.67 13-Oct-2003  scw In uvm_lwp_fork(), check if PMAP_UAREA() is defined and if so, invoke it
with the KVA of the newly-wired uarea.

This is useful on some architectures (e.g. xscale) where the uarea mapping
can be tweaked to use the mini-data cache instead of the main cache.
 1.66 29-Jun-2003  fvdl branches: 1.66.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.65 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.64 14-Feb-2003  atatat Rework the way in which the map is traversed when dumping core. Now
we read-lock the map and call uvm_map_lookup_entry() instead of simply
walking from the header to the next and to the next, etc.

Dumping from sparsely populated amaps could cause faults that would
result in amaps being split, which (in turn) resulted in the core
dumping routines dumping some regions of memory twice. This makes the
core file too large, the headers not match, gdb not work properly,
and so on.

Addresses PR 19260.
 1.63 22-Jan-2003  yamt make KSTACK_CHECK_* compile after sa merge.
 1.62 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.61 17-Nov-2002  chs change uvm_uarea_alloc() to indicate whether the returned uarea is already
backed by physical pages (ie. because it reused a previously-freed one),
so that we can skip a bunch of useless work in that case.
this fixes the underlying problem behind PR 18543, and also speeds up fork()
quite a bit (eg. 7% on my pc, 1% on my ultra2) when we get a cache hit.
 1.60 22-Sep-2002  chs encapsulate knowledge of uarea allocation in some new functions.
 1.59 02-Jul-2002  yamt add KSTACK_CHECK_MAGIC. discussed on tech-kern.
 1.58 15-May-2002  matt branches: 1.58.2;
When core dumping a process, don't dump maps backed up by the device pager.
(move the pagerops externs to uvm_object.h and out the C files).
 1.57 31-Dec-2001  chs introduce a new UVM fault type, VM_FAULT_WIREMAX. this is different
from VM_FAULT_WIRE in that when the pages being wired are faulted in,
the simulated fault is at the maximum protection allowed for the mapping
instead of the current protection. use this in uvm_map_pageable{,_all}()
to fix the problem where writing via ptrace() to shared libraries that
are also mapped with wired mappings in another process causes a
diagnostic panic when the wired mapping is removed.

this is a really obscure problem so it deserves some more explanation.
ptrace() writing to another process ends up down in uvm_map_extract(),
which for MAP_PRIVATE mappings (such as shared libraries) will cause
the amap to be copied or created. then the amap is made shared
(ie. the AMAP_SHARED flag is set) between the kernel and the ptrace()d
process so that the kernel can modify pages in the amap and have the
ptrace()d process see the changes. then when the page being modified
is actually faulted on, the object pages (from the shared library vnode)
is copied to a new anon page and inserted into the shared amap.
to make all the processes sharing the amap actually see the new anon
page instead of the vnode page that was there before, we need to
invalidate all the pmap-level mappings of the vnode page in the pmaps
of the processes sharing the amap, but we don't have a good way of
doing this. the amap doesn't keep track of the vm_maps which map it.
so all we can do at this point is to remove all the mappings of the
page with pmap_page_protect(), but this has the unfortunate side-effect
of removing wired mappings as well. removing wired mappings with
pmap_page_protect() is a legitimate operation, it can happen when a file
with a wired mapping is truncated. so the pmap has no way of knowing
whether a request to remove a wired mapping is normal or when it's due to
this weird situation. so the pmap has to remove the weird mapping.
the process being ptrace()d goes away and life continues. then,
much later when we go to unwire or remove the wired vm_map mapping,
we discover that the pmap mapping has been removed when it should
still be there, and we panic.

so where did we go wrong? the problem is that we don't have any way
to update just the pmap mappings that need to be updated in this
scenario. we could invent a mechanism to do this, but that is much
more complicated than this change and it doesn't seem like the right
way to go in the long run either.

the real underlying problem here is that wired pmap mappings just
aren't a good concept. one of the original properties of the pmap
design was supposed to be that all the information in the pmap could
be thrown away at any time and the VM system could regenerate it all
through fault processing, but wired pmap mappings don't allow that.
a better design for UVM would not require wired pmap mappings,
and Chuck C. and I are talking about this, but it won't be done
anytime soon, so this change will do for now.

this change has the effect of causing MAP_PRIVATE mappings to be
copied to anonymous memory when they are mlock()d, so that uvm_fault()
doesn't need to copy these pages later when called from ptrace(), thus
avoiding the call to pmap_page_protect() and the panic that results
from this when the mlock()d region is unlocked or freed. note that
this change doesn't help the case where the wired mapping is MAP_SHARED.

discussed at great length with Chuck Cranor.
fixes PRs 10363, 12554, 12604, 13041, 13487, 14580 and 14853.
 1.56 10-Dec-2001  thorpej Move the code that walks the process's VM map during a coredump
into uvm_coredump_walkmap(), and use callbacks into the coredump
routine to do something with each section.
 1.55 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.54 06-Nov-2001  chs in uvm_exit(), don't bother to unwire the uarea before we free it,
the pages will be freed anyway.
 1.53 23-Sep-2001  chs branches: 1.53.2;
bump the rusage counter for "swaps" when we swap out a process.
addresses PR 6170.
 1.52 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.51 10-Sep-2001  chris Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.
 1.50 02-Jun-2001  chs branches: 1.50.2; 1.50.4;
replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.49 30-May-2001  lukem add missing #include "opt_kgdb.h"
 1.48 25-May-2001  chs remove trailing whitespace.
 1.47 24-Apr-2001  thorpej Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.
 1.46 21-Apr-2001  thorpej The pmap_update() call at the end of uvm_swapout_threads() is
completely useless. Nuke it.
 1.45 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.44 06-Feb-2001  eeh branches: 1.44.2;
Move maxdmap and maxsmap where they belong and make them big enough.
 1.43 25-Nov-2000  chs lots of cleanup:
use queue.h macros and KASSERT().
address amap offsets in pages instead of bytes.
make amap_ref() and amap_unref() take an amap, offset and length
instead of a vm_map_entry_t.
improve whitespace and comments.
 1.42 11-Oct-2000  thorpej - uvmspace_share(): If p2 has a vmspace already, make sure to deactivate
it and free it as appropriate. Activate p2's new address space once
it references p1's.
- uvm_fork(): Make sure the child's vmspace is NULL before calling
uvmspace_share() (the child doens't have one already in this case).

These changes do not change the behavior for the current use of
uvmspace_share() (vfork(2)), but make it possible for an already
running process (such as a kernel thread) to properly attach to
another process's address space.
 1.41 23-Sep-2000  enami splstatclock is insufficient to protect run queues. Acquire scheduler
lock instead.
 1.40 21-Aug-2000  thorpej Remove a totally unnecessary splhigh/spl0 pair.
 1.39 12-Aug-2000  sommerfeld add comment warning about possible unlock/sleep race
 1.38 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.37 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.36 18-Jun-2000  simonb Set p->p_addr to NULL after it gets freed.
 1.35 08-Jun-2000  thorpej Change UVM_UNLOCK_AND_WAIT() to use ltsleep() (it is now atomic, as
advertised). Garbage-collect uvm_sleep().
 1.34 28-May-2000  thorpej Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.
 1.33 26-May-2000  thorpej branches: 1.33.2;
Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.
 1.32 30-Mar-2000  augustss Remove more register declarations.
 1.31 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.30 13-Nov-1999  thorpej Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.
 1.29 25-Jul-1999  thorpej branches: 1.29.2; 1.29.4; 1.29.8;
Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.
 1.28 22-Jul-1999  thorpej Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.27 08-Jul-1999  thorpej Change the pmap_extract() interface to:
boolean_t pmap_extract(pmap_t, vaddr_t, paddr_t *);
This makes it possible for the pmap to map physical address 0.
 1.26 17-Jun-1999  thorpej Make uvm_vslock() return the error code from uvm_fault_wire(). All places
which use uvm_vslock() should now test the return value. If it's not
KERN_SUCCESS, wiring the pages failed, so the operation which is using
uvm_vslock() should error out.

XXX We currently just EFAULT a failed uvm_vslock(). We may want to do
more about translating error codes in the future.
 1.25 17-Jun-1999  thorpej In uvm_useracc(), make sure we have a read lock on the map before
calling uvm_map_checkprot().
 1.24 17-Jun-1999  thorpej The i386 and pc532 pmaps are officially fixed.
 1.23 28-May-1999  thorpej Make uvm_fault_unwire() take a vm_map_t, rather than a pmap_t, for
consistency. Use this opportunity for checking for intrsafe map use
in this routine (which is illegal).
 1.22 26-May-1999  thorpej Pass an access_type to uvm_vslock().
 1.21 26-May-1999  thorpej - uvm_fork()/uvm_swapin(): pass VM_PROT_READ|VM_PROT_WRITE access_type
to uvm_fault_wire(), to guarantee that the kernel stacks will not
cause even a mod/ref emulation fault.
- uvm_vslock(): pass VM_PROT_NONE until this function is updated.
 1.20 13-May-1999  thorpej Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).
 1.19 30-Apr-1999  thorpej Pull signal actions out of struct user, make them a separate proc
substructure, and allow them to be shared.

Required for clone(2).
 1.18 26-Mar-1999  mycroft branches: 1.18.4;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().
 1.17 25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.16 15-Mar-1999  chs remove a debugging printf.
 1.15 19-Oct-1998  tron Defopt SYSVMSG, SYSVSEM and SYSVSHM.
 1.14 08-Sep-1998  thorpej Implement uvm_exit(), which frees VM resources when a process finishes
exiting.
 1.13 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.12 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.11 09-May-1998  kleink branches: 1.11.2;
Use size_t to pass the length of the memory region to operate on to chgkprot(),
kernacc(), useracc(), vslock() and vsunlock(); (unsigned) ints are not
adequate on all platforms.
 1.10 08-May-1998  kleink Make uvm_vsunlock() actually use the proc * passed to it; per discussion
with Jason Thorpe.
 1.9 30-Apr-1998  thorpej Pass vslock() and vsunlock() a proc *, rather than implicitly operating
on curproc.
 1.8 09-Apr-1998  thorpej Oops, fix a typo.
 1.7 09-Apr-1998  thorpej Allocate kernel virtual address space for the U-area before allocating
the new proc structure when performing a fork. This makes it much
easier to abort a fork operation and return an error if we run out
of KVA space.

The U-area pages are still wired down in {,u}vm_fork(), as before.
 1.6 09-Mar-1998  mrg KNF.
 1.5 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 07-Feb-1998  mrg restore rcsids
 1.3 07-Feb-1998  chs add locking of kernel_map in uvm_kernacc().
check return value of uvm_fault_wire() in uvm_fork().
enable swappings.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.11.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.18.4.3 11-Aug-1999  chs add casts for trunc_page() and round_page() args.
 1.18.4.2 02-Aug-1999  thorpej Update from trunk.
 1.18.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.29.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.29.4.1 15-Nov-1999  fvdl Sync with -current
 1.29.2.5 23-Apr-2001  bouyer Sync with HEAD.
 1.29.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.29.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.29.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.29.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.33.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.44.2.22 15-Dec-2002  thorpej Add a mutex to the uarea cache.
 1.44.2.21 15-Dec-2002  thorpej Fix a comment.
 1.44.2.20 11-Dec-2002  thorpej Sync with HEAD.
 1.44.2.19 18-Oct-2002  nathanw L_INMEM, not P_INMEM.
 1.44.2.18 18-Oct-2002  nathanw Catch up to -current.
 1.44.2.17 01-Aug-2002  nathanw Catch up to -current.
 1.44.2.16 16-Jul-2002  nathanw Revert to curproc (in a comment).
 1.44.2.15 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.44.2.14 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.44.2.13 20-Jun-2002  nathanw Catch up to -current.
 1.44.2.12 28-Feb-2002  nathanw Add some LWP-specific swapout debugging.
 1.44.2.11 08-Jan-2002  nathanw Catch up to -current.
 1.44.2.10 16-Dec-2001  gmcgarry call cpu_proc_fork() from uvm_proc_fork()
 1.44.2.9 08-Dec-2001  thorpej cpu_fork() -> cpu_lwp_fork(). This logically forks an LWP, not a
complete process. As noted by Gregory McGarry on tech-kern.
 1.44.2.8 14-Nov-2001  nathanw Catch up to -current.
 1.44.2.7 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.44.2.6 21-Sep-2001  nathanw Catch up to -current.
 1.44.2.5 03-Jul-2001  nathanw Correct merge lossage; lose the now-extraneous splstatclock().
 1.44.2.4 21-Jun-2001  nathanw Catch up to -current.
 1.44.2.3 09-Apr-2001  nathanw Catch up with -current.
 1.44.2.2 19-Mar-2001  nathanw Fix a very stupid and annoying bug: Don't try to uvm_fault_unwire() a
LWP's u-area twice.

Thirty lashes with a wet noodle for this one.
 1.44.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.50.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.50.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.50.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.50.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.50.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.50.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.53.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.58.2.1 15-Jul-2002  gehenna catch up with -current.
 1.66.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.66.2.7 01-Apr-2005  skrll Sync with HEAD.
 1.66.2.6 09-Feb-2005  skrll Sync with HEAD.
 1.66.2.5 24-Jan-2005  skrll Sync with HEAD.
 1.66.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.66.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.66.2.2 03-Aug-2004  skrll Sync with HEAD
 1.66.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.78.6.1 06-Nov-2005  riz Pull up following revision(s) (requested by bouyer in ticket #5965):
sys/uvm/uvm_glue.c: revision 1.90
remove the assertion in uvm_swapout_threads() about LSONPROC lwps
not running on the same CPU as the swapper. l_stat is protected by
sched_lock, which isn't held here, so we can race with that lwp
starting to run and see its l_cpu not updated yet, as in PR 31870.
we check l_stat again in uvm_swapout() while holding sched_lock,
so the race itself is harmless.
 1.78.4.1 06-Nov-2005  riz Pull up following revision(s) (requested by bouyer in ticket #5965):
sys/uvm/uvm_glue.c: revision 1.90
remove the assertion in uvm_swapout_threads() about LSONPROC lwps
not running on the same CPU as the swapper. l_stat is protected by
sched_lock, which isn't held here, so we can race with that lwp
starting to run and see its l_cpu not updated yet, as in PR 31870.
we check l_stat again in uvm_swapout() while holding sched_lock,
so the race itself is harmless.
 1.78.2.1 06-Nov-2005  riz Pull up following revision(s) (requested by bouyer in ticket #5965):
sys/uvm/uvm_glue.c: revision 1.90
remove the assertion in uvm_swapout_threads() about LSONPROC lwps
not running on the same CPU as the swapper. l_stat is protected by
sched_lock, which isn't held here, so we can race with that lwp
starting to run and see its l_cpu not updated yet, as in PR 31870.
we check l_stat again in uvm_swapout() while holding sched_lock,
so the race itself is harmless.
 1.81.4.1 29-Apr-2005  kent sync with -current
 1.82.2.2 12-Feb-2005  yamt sync with head.
 1.82.2.1 25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.83.4.3 06-Dec-2005  riz Apply patch (requested by yamt in ticket #1015):
sys/uvm/uvm_glue.c: patch
sys/uvm/uvm_km.c: patch
- correct a return value of uvm_km_valloc1 in the case of failure.
- do waitok allocation for uvm_uarea_alloc so that it won't fail on
temporary memory shortage.
 1.83.4.2 28-Oct-2005  jmc Pullup rev 1.90 (requested by chs in ticket #914)
Remove the assertion in uvm_swapout_threads() about LSONPROC lwps
not running on the same CPU as the swapper. l_stat is protected by
sched_lock, which isn't held here, so we can race with that lwp
starting to run and see its l_cpu not updated yet, as in PR 31870.
we check l_stat again in uvm_swapout() while holding sched_lock,
so the race itself is harmless.
 1.83.4.1 22-May-2005  snj Pull up revision 1.85 (requested by nathanw in ticket #322):
uvm_coredump_walkmap(): Set UVM_COREDUMP_NODUMP on regions whose
protection does not include VM_PROT_READ, so that the core dumping
doesn't error out with EFAULT when trying to write that region.
Addresses PR kern/30143; approach suggested by chs@.
 1.89.4.1 26-Oct-2005  yamt sync with head
 1.89.2.10 17-Mar-2008  yamt sync with head.
 1.89.2.9 11-Feb-2008  yamt sync with head.
 1.89.2.8 04-Feb-2008  yamt sync with head.
 1.89.2.7 21-Jan-2008  yamt sync with head
 1.89.2.6 15-Nov-2007  yamt sync with head.
 1.89.2.5 27-Oct-2007  yamt sync with head.
 1.89.2.4 03-Sep-2007  yamt sync with head.
 1.89.2.3 26-Feb-2007  yamt sync with head.
 1.89.2.2 30-Dec-2006  yamt sync with head.
 1.89.2.1 21-Jun-2006  yamt sync with head.
 1.92.10.1 19-Apr-2006  elad oops - *really* sync to head this time.
 1.92.8.4 03-Sep-2006  yamt sync with head.
 1.92.8.3 26-Jun-2006  yamt sync with head.
 1.92.8.2 24-May-2006  yamt sync with head.
 1.92.8.1 01-Apr-2006  yamt sync with head.
 1.92.6.2 01-Jun-2006  kardel Sync with head.
 1.92.6.1 22-Apr-2006  simonb Sync with head.
 1.92.4.1 09-Sep-2006  rpaulo sync with head
 1.93.4.1 19-Jun-2006  chap Sync with head.
 1.93.2.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.96.4.1 22-Oct-2006  yamt sync with head
 1.96.2.6 29-Dec-2006  ad Checkpoint work in progress.
 1.96.2.5 18-Nov-2006  ad Sync with head.
 1.96.2.4 17-Nov-2006  ad Checkpoint work in progress.
 1.96.2.3 24-Oct-2006  ad - Redo LWP locking slightly and fix some races.
- Fix some locking botches.
- Make signal mask / stack per-proc for SA processes.
- Add _lwp_kill().
 1.96.2.2 21-Oct-2006  ad Checkpoint work in progress on locking and per-LWP signals. Very much a
a work in progress and there is still a lot to do.
 1.96.2.1 11-Sep-2006  ad - Allocate and free turnstiles where needed.
- Split proclist_mutex and alllwp_mutex out of the proclist_lock,
and use in interrupt context.
- Fix an MP race in enterpgrp()/setsid().
- Acquire proclist_lock and p_crmutex in some obvious places.
 1.99.2.6 19-Apr-2007  ad Don't swap out threads blocked on a turnstile, to avoid deadlock.
 1.99.2.5 15-Apr-2007  yamt sync with head.
 1.99.2.4 17-Mar-2007  rmind Do not do an implicit enqueue in sched_switch(), move enqueueing back to
the dispatcher. Rename sched_switch() back to sched_nextlwp(). Add for
sched_enqueue() new argument, which indicates the calling from mi_switch().

Requested by yamt@
 1.99.2.3 12-Mar-2007  rmind Sync with HEAD.
 1.99.2.2 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.99.2.1 20-Feb-2007  rmind General Common Scheduler Framework (CSF) patch import. Huge thanks for
Daniel Sieger <dsieger at TechFak.Uni-Bielefeld de> for this work.

Short abstract: Split the dispatcher from the scheduler in order to
make the scheduler more modular. Introduce initial API for other
schedulers' implementations.

Discussed in tech-kern@
OK: yamt@, ad@

Note: further work will go soon.
 1.104.6.1 29-Mar-2007  reinoud Pullup to -current
 1.104.4.1 11-Jul-2007  mjf Sync with head.
 1.104.2.13 05-Nov-2007  ad uvm_scheduler: set curlwp->l_class = SCHED_FIFO so that the swapper does
not get its priority adjusted by the scheduler. This is a special case
since init inherits via fork() and we can only adjust the swapper after.
 1.104.2.12 01-Nov-2007  ad - Fix interactivity problems under high load. Beacuse soft interrupts
are being stacked on top of regular LWPs, more often than not aston()
was being called on a soft interrupt thread instead of a user thread,
meaning that preemption was not happening on EOI.

- Don't use bool in a couple of data structures. Sub-word writes are not
always atomic and may clobber other fields in the containing word.

- For SCHED_4BSD, make p_estcpu per thread (l_estcpu). Rework how the
dynamic priority level is calculated - it's much better behaved now.

- Kill the l_usrpri/l_priority split now that priorities are no longer
directly assigned by tsleep(). There are three fields describing LWP
priority:

l_priority: Dynamic priority calculated by the scheduler.
This does not change for kernel/realtime threads,
and always stays within the correct band. Eg for
timeshared LWPs it never moves out of the user
priority range. This is basically what l_usrpri
was before.

l_inheritedprio: Lent to the LWP due to priority inheritance
(turnstiles).

l_kpriority: A boolean value set true the first time an LWP
sleeps within the kernel. This indicates that the LWP
should get a priority boost as compensation for blocking.
lwp_eprio() now does the equivalent of sched_kpri() if
the flag is set. The flag is cleared in userret().

- Keep track of scheduling class (OTHER, FIFO, RR) in struct lwp, and use
this to make decisions in a few places where we previously tested for a
kernel thread.

- Partially fix itimers and usr/sys/intr time accounting in the presence
of software interrupts.

- Use kthread_create() to create idle LWPs. Move priority definitions
from the various modules into sys/param.h.

- newlwp -> lwp_create
 1.104.2.11 27-Oct-2007  yamt fix priorities for some kernel threads. advised and ok'ed by Andrew Doran.
 1.104.2.10 18-Oct-2007  ad Free uareas back to the uarea cache on the CPU where they were last used.
 1.104.2.9 09-Oct-2007  ad Sync with head.
 1.104.2.8 20-Aug-2007  ad Sync with HEAD.
 1.104.2.7 08-Jun-2007  ad Sync with head.
 1.104.2.6 10-Apr-2007  ad Don't swap out threads blocked on a turnstile, to avoid deadlock.
It doesn't make a lot of sense, anyhow.
 1.104.2.5 10-Apr-2007  ad Sync with head.
 1.104.2.4 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.104.2.3 05-Apr-2007  ad - Put a per-LWP lock around swapin / swapout.
- Replace use of lockmgr().
- Minor locking fixes and assertions.
- uvm_map.h no longer pulls in proc.h, etc.
- Use kpause where appropriate.
 1.104.2.2 21-Mar-2007  ad GC the simplelock/spinlock debugging stuff.
 1.104.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.108.6.3 06-Nov-2007  joerg Sync with HEAD.
 1.108.6.2 02-Oct-2007  joerg Sync with HEAD.
 1.108.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.108.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.111.2.3 23-Mar-2008  matt sync with HEAD
 1.111.2.2 09-Jan-2008  matt sync with HEAD
 1.111.2.1 06-Nov-2007  matt sync with HEAD
 1.112.6.2 18-Feb-2008  mjf Sync with HEAD.
 1.112.6.1 19-Nov-2007  mjf Sync with HEAD.
 1.112.4.1 13-Nov-2007  bouyer Sync with HEAD
 1.113.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.113.2.2 15-Dec-2007  ad uvm_lwp_hold, uvm_lwp_rele: use atomic ops to avoid lock order problems.
 1.113.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.117.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.117.6.4 29-Jun-2008  mjf Sync with HEAD.
 1.117.6.3 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.117.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.117.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.117.2.1 24-Mar-2008  keiichi sync with head.
 1.124.2.3 17-Jun-2008  yamt sync with head.
 1.124.2.2 04-Jun-2008  yamt sync with head
 1.124.2.1 18-May-2008  yamt sync with head.
 1.126.4.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.126.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.126.2.5 11-Aug-2010  yamt sync with head.
 1.126.2.4 11-Mar-2010  yamt sync with head
 1.126.2.3 19-Aug-2009  yamt sync with head.
 1.126.2.2 18-Jul-2009  yamt sync with head.
 1.126.2.1 04-May-2009  yamt sync with head.
 1.131.2.2 27-Jun-2008  simonb Sync with head.
 1.131.2.1 18-Jun-2008  simonb Sync with head.
 1.133.6.1 01-Apr-2009  snj Pull up following revision(s) (requested by mrg in ticket #622):
bin/csh/csh.1: revision 1.46
bin/csh/func.c: revision 1.37
bin/ps/print.c: revision 1.111
bin/ps/ps.c: revision 1.74
bin/sh/miscbltin.c: revision 1.38
bin/sh/sh.1: revision 1.92 via patch
external/bsd/top/dist/machine/m_netbsd.c: revision 1.7
lib/libkvm/kvm_proc.c: revision 1.82
sys/arch/mips/mips/cpu_exec.c: revision 1.55
sys/compat/darwin/darwin_exec.c: revision 1.57
sys/compat/ibcs2/ibcs2_exec.c: revision 1.73
sys/compat/irix/irix_resource.c: revision 1.15
sys/compat/linux/arch/amd64/linux_exec_machdep.c: revision 1.16
sys/compat/linux/arch/i386/linux_exec_machdep.c: revision 1.12
sys/compat/linux/common/linux_limit.h: revision 1.5
sys/compat/osf1/osf1_resource.c: revision 1.14
sys/compat/svr4/svr4_resource.c: revision 1.18
sys/compat/svr4_32/svr4_32_resource.c: revision 1.17
sys/kern/exec_subr.c: revision 1.62
sys/kern/init_sysctl.c: revision 1.160
sys/kern/kern_exec.c: revision 1.288
sys/kern/kern_resource.c: revision 1.151
sys/sys/param.h: patch
sys/sys/resource.h: revision 1.31
sys/sys/sysctl.h: revision 1.184
sys/uvm/uvm_extern.h: revision 1.153
sys/uvm/uvm_glue.c: revision 1.136
sys/uvm/uvm_mmap.c: revision 1.128
usr.bin/systat/ps.c: revision 1.32
- - add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.
- - adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.
- - add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)
- - patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)
- - patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.
- - update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)
this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.
tested on i386 and sparc64, build tested on several other platforms.
thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
 1.133.4.3 28-Apr-2009  skrll Sync with HEAD.
 1.133.4.2 03-Mar-2009  skrll Sync with HEAD.
 1.133.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.133.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.135.2.2 23-Jul-2009  jym Sync with HEAD.
 1.135.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.143.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.144.2.4 05-Mar-2011  rmind sync with head
 1.144.2.3 30-May-2010  rmind sync with head
 1.144.2.2 25-Apr-2010  rmind - Invent mm_md_getva() and mm_md_relva() routines, provided by MD and
indicated with __HAVE_MM_MD_PREFER_VA. It will be used to deal with
cache aliasing issues and thus fix little MIPS, ARM and friends.

- Convert dev_mem_readwrite() to use unmanaged mappings. Fix a missed
offset addition in a case of direct map. Sprinkle various comments in
the memory device driver.

- Add missing direct map handling on hp700 and vax. Make checks across
m68k ports more consistent, reduce the diffs. Fix kernacc check miss
on news68k. Minor off-by-one fix for alpha. Add MEMC_PHYS_BASE for
mmap() case check on acorn26. Misc clean-up.
 1.144.2.1 18-Mar-2010  rmind Unify /dev/{mem,kmem,zero,null} implementations in MI code. Based on patch
from Joerg Sonnenberger, proposed on tech-kern@, in February 2008.

Work and depression still in progress.
 1.146.4.2 05-Mar-2011  bouyer Sync with HEAD
 1.146.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.146.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.149.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.151.2.2 30-Oct-2012  yamt sync with head
 1.151.2.1 17-Apr-2012  yamt sync with head
 1.152.2.3 29-Apr-2012  mrg sync to latest -current.
 1.152.2.2 24-Feb-2012  mrg sync to -current.
 1.152.2.1 18-Feb-2012  mrg merge to -current.
 1.156.2.3 12-Apr-2012  riz branches: 1.156.2.3.2;
Pull up following revision(s) (requested by martin in ticket #175):
sys/kern/kern_exit.c: revision 1.238
tests/lib/libc/gen/posix_spawn/t_fileactions.c: revision 1.4
tests/lib/libc/gen/posix_spawn/t_fileactions.c: revision 1.5
sys/uvm/uvm_extern.h: revision 1.183
lib/libc/gen/posix_spawn_fileactions.c: revision 1.2
sys/kern/kern_exec.c: revision 1.348
sys/kern/kern_exec.c: revision 1.349
sys/compat/netbsd32/syscalls.master: revision 1.95
sys/uvm/uvm_glue.c: revision 1.159
sys/uvm/uvm_map.c: revision 1.317
sys/compat/netbsd32/netbsd32.h: revision 1.95
sys/kern/exec_elf.c: revision 1.38
sys/sys/spawn.h: revision 1.2
sys/sys/exec.h: revision 1.135
sys/compat/netbsd32/netbsd32_execve.c: revision 1.34
Rework posix_spawn locking and memory management:
- always provide a vmspace for the new proc, initially borrowing from proc0
(this part fixes PR 46286)
- increase parallelism between parent and child if arguments allow this,
avoiding a potential deadlock on exec_lock
- add a new flag for userland to request old (lockstepped) behaviour for
better error reporting
- adapt test cases to the previous two and add a new variant to test the
diagnostics flag
- fix a few memory (and lock) leaks
- provide netbsd32 compat
Fix asynchronous posix_spawn child exit status (and test for it).
 1.156.2.2 09-Apr-2012  riz Pull up following revision(s) (requested by chs in ticket #167):
sys/uvm/uvm_glue.c: revision 1.158
fix uarea_system_poolpage_free() to handle freeing a uarea
that was not allocated by cpu_uarea_alloc() (ie. on plaforms
where cpu_uarea_alloc() failing is not fatal).
fixes PR 46284.
 1.156.2.1 20-Feb-2012  sborrill Pull up the following revisions(s) (requested by martin in ticket #14):
include/spawn.h: revision 1.2
sys/kern/kern_exec.c: revision 1.341
sys/uvm/uvm_glue.c: revision 1.157
tests/lib/libc/gen/posix_spawn/t_fileactions.c: revision 1.3

posix_spawn: fix kernel bug when passing empty fileactions (PR kern/46038)
and add a test case for this. Fix potential race condition, doublefreeing
of memory and memory leaks in error cases.
 1.156.2.3.2.1 28-Nov-2012  matt Pull from HEAD:
Add a __HAVE_CPU_UAREA_IDLELWP hook so that the MD code can allocate
special UAREAs for idle lwp's.
 1.160.2.1 03-Dec-2017  jdolecek update from HEAD
 1.161.2.1 29-May-2016  skrll Sync with HEAD
 1.163.18.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.163.18.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.163.18.1 10-Jun-2019  christos Sync with HEAD
 1.163.16.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.163.16.2 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.163.16.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.175.2.1 17-Jan-2020  ad Sync with head.
 1.177.2.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.10 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.9 28-Jan-2008  yamt branches: 1.9.32; 1.9.38; 1.9.40;
remove a special allocator for uareas, which is no longer necessary.
use pool_cache instead.
 1.8 11-Dec-2005  christos branches: 1.8.46; 1.8.52;
merge ktrace-lwp.
 1.7 24-Mar-2004  junyoung branches: 1.7.16;
Nuke __P().
 1.6 21-Jun-1999  thorpej branches: 1.6.36;
Protect prototypes, certain macros, and inlines from userland.
 1.5 25-Mar-1999  mrg branches: 1.5.4;
remove now >1 year old pre-release message.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.5.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.6.36.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.6.36.2 18-Sep-2004  skrll Sync with HEAD.
 1.6.36.1 03-Aug-2004  skrll Sync with HEAD
 1.7.16.1 04-Feb-2008  yamt sync with head.
 1.8.52.1 18-Feb-2008  mjf Sync with HEAD.
 1.8.46.1 23-Mar-2008  matt sync with HEAD
 1.9.40.1 08-Feb-2011  bouyer Sync with HEAD
 1.9.38.1 06-Jun-2011  jruoho Sync with HEAD.
 1.9.32.1 05-Mar-2011  rmind sync with head
 1.2 01-Aug-2000  wiz Rename VM_INHERIT_* to MAP_INHERIT_* and move them to sys/sys/mman.h as
discussed on tech-kern.
Retire sys/uvm/uvm_inherit.h, update man page for minherit(2).
 1.1 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.59 23-Sep-2023  ad Repply this change with a couple of bugs fixed:

- Do away with separate pool_cache for some kernel objects that have no special
requirements and use the general purpose allocator instead. On one of my
test systems this makes for a small (~1%) but repeatable reduction in system
time during builds presumably because it decreases the kernel's cache /
memory bandwidth footprint a little.
- vfs_lockf: cache a pointer to the uidinfo and put mutex in the data segment.
 1.58 12-Sep-2023  ad Back out recent change to replace pool_cache with then general allocator.
Will return to this when I have time again.
 1.57 10-Sep-2023  ad - Do away with separate pool_cache for some kernel objects that have no special
requirements and use the general purpose allocator instead. On one of my
test systems this makes for a small (~1%) but repeatable reduction in system
time during builds presumably because it decreases the kernel's cache /
memory bandwidth footprint a little.
- vfs_lockf: cache a pointer to the uidinfo and put mutex in the data segment.
 1.56 17-Jul-2023  riastradh uvm(9): One rndsource for faults -- not one per CPU.

All relevant state is per-CPU anyway; the only substantive difference
this makes is how many entries appear in `rndctl -l' output and what
they are called -- formerly the somewhat confusing `cpuN', meaning
`page faults on cpuN', and now just `uvmfault'. I don't think
there's any real value in being able to enable or disable measurement
or counting of page faults on one CPU vs others, so although this
could be a minor compatibility change, it's hard to imagine it
matters much.

XXX kernel ABI change in struct cpu_info
 1.55 04-Nov-2020  chs In uvmpd_tryownerlock(), if the initial try-lock of the owner lock fails
then rather than do more try-locks and eventually sleep for a tick,
take a hold on the current owner's lock, drop the page interlock,
and acquire the lock that we took the hold on in a blocking fashion.
After we get the lock, check if the lock that we acquired is still
the lock for the owner of the page that we're interested in.
If the owner hasn't changed then can proceed with this page,
otherwise we will skip this page and move on to a different page.
This dramatically reduces the amount of time that the pagedaemon
sleeps trying to get locks, since even 1 tick is an eternity to sleep
in this context and it was easy to trigger that case in practice,
and with this new method the pagedaemon only very rarely actually blocks
to acquire the lock that it wants since the object locks are adaptive,
and when the pagedaemon does block then the amount of time it spends
sleeping will be generally be much less than 1 tick.
 1.54 07-Oct-2020  chs branches: 1.54.2;
Add a new, more aggressive allocator for uvm_pglistalloc() to allocate
contiguous physical pages, and try this new allocator if the existing
one fails. The existing contig allocator only tries to allocate pages
that are already free, which works fine shortly after boot but rarely
works after the system has been up for a while. The new allocator uses
the pagedaemon to evict pages from memory in the hope that this will
free up a range of pages that satisfies the constraits of the request.
This should help with things like plugging in a USB device, which often
fails for some USB controllers because they can't get contigous memory.
 1.53 06-Mar-2020  ad Fix a comment.
 1.52 27-Dec-2019  ad Redo the page allocator to perform better, especially on multi-core and
multi-socket systems. Proposed on tech-kern. While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
 1.51 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.50 01-Dec-2019  ad Give each of the page queue locks their own cache line.
 1.49 19-May-2018  jdolecek branches: 1.49.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.
 1.48 23-Dec-2016  cherry branches: 1.48.14;
"Make NetBSD great again!"

Introduce uvm_hotplug(9) to the kernel.

Many thanks, in no particular order to:

TNF, for funding the project.

Chuck Silvers - for multiple API reviews and feedback.
Nick Hudson - for testing on multiple architectures and bugfix patches.
Everyone who helped with boot testing.

KeK (http://www.kek.org.in) for hosting the primary developers.
 1.47 22-Dec-2016  cherry Add a new function called uvm_md_init() that can be called at the
appropriate time in the boot path by MD code.
 1.46 03-Apr-2015  riastradh branches: 1.46.2;
Initialize P->V tracking for unmanaged device pages in uvm_init.

Conditional on __HAVE_PMAP_PV_TRACK until we add it to all pmaps.

MI part of pmap_pv(9) change proposed on tech-kern:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html
 1.45 29-Jan-2013  para branches: 1.45.12; 1.45.14;
improve on comments
 1.44 17-Feb-2012  matt branches: 1.44.2;
Make sure to export uvmexp_* if MODULAR is defined.
Make the uvmexp_page* be a pointer to a const int as well as having the
pointer be const as well.
 1.43 28-Jan-2012  rmind pool_page_alloc, pool_page_alloc_meta: avoid extra compare, use const.
ffs_mountfs,sys_swapctl: replace memset with kmem_zalloc.
sys_swapctl: move kmem_free outside the lock path.
uvm_init: fix comment, remove pointless numeration of steps.
uvm_map_enter: remove meflagval variable.
Fix some indentation.
 1.42 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.41 24-Apr-2011  rmind branches: 1.41.4; 1.41.8;
Initialize UVM loaning subsystem a bit later, after kmem(9).
Makes UVMHIST work again.
 1.40 23-Apr-2011  rmind Replace "malloc" in comments, remove unnecessary header inclusions.
 1.39 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.38 14-Nov-2010  uebayasi branches: 1.38.2; 1.38.4;
Oops. Fix thinko.
 1.37 14-Nov-2010  uebayasi Platforms that dynamically set PAGE_{SIZE,MASK,SHIFT}, those values are
saved in struct uvmexp. Expose only the relevant part for symbol users,
so that they don't need to include the whole uvm(9) API.
 1.36 21-Oct-2009  rmind branches: 1.36.2; 1.36.4;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.35 28-Jun-2009  rmind Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
 1.34 18-Oct-2008  rmind branches: 1.34.8; 1.34.12;
- Initialize pool subsystem and kmem(9) earlier, when UVM is up enough.
- Remove uao_hashinit() workaround used for anon-objects.
- Replace malloc with kmem.

OK by <yamt>.
 1.33 04-Jun-2008  ad branches: 1.33.4;
Replace the global vm_page hash with a per vm_object rbtree.
Proposed on tech-kern@.
 1.32 28-Jan-2008  yamt branches: 1.32.6; 1.32.8; 1.32.10; 1.32.12;
remove a special allocator for uareas, which is no longer necessary.
use pool_cache instead.
 1.31 02-Jan-2008  ad Merge vmlocking2 to head.
 1.30 14-Nov-2007  yamt branches: 1.30.2; 1.30.6;
call debug_init earlier. ie. before malloc is used.
 1.29 18-Aug-2007  ad branches: 1.29.2; 1.29.6; 1.29.8;
Make the uarea cache per-CPU and drain in batches of 4.
 1.28 21-Jul-2007  ad branches: 1.28.4; 1.28.6;
Merge unobtrusive locking changes from the vmlocking branch.
 1.27 09-Jul-2007  ad branches: 1.27.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.26 15-Sep-2006  yamt branches: 1.26.10; 1.26.12;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.25 25-May-2006  yamt branches: 1.25.6;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html
 1.24 11-Dec-2005  christos branches: 1.24.4; 1.24.6; 1.24.8; 1.24.14;
merge ktrace-lwp.
 1.23 27-Jun-2005  thorpej branches: 1.23.2;
Use ANSI function decls.
 1.22 11-May-2005  yamt allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.21 23-Jan-2005  chs move the call to link_pool_init() to the end of uvm_init(). needed for sun3.
 1.20 25-Apr-2004  simonb branches: 1.20.4;
Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.19 26-Oct-2003  jdolecek update comment - kmem_map is created in kmeminit(), not uvm_km_init()
 1.18 10-May-2003  thorpej branches: 1.18.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.
 1.17 08-May-2003  thorpej Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).
 1.16 04-Mar-2003  thorpej Fix the following pathological scanario:
* User allocates ZFOD region, but does not actually touch the buffer
to fault in the pages.
* In a loop, user writes this buffer to a network socket, triggering
sosend_loan().
* uvm_loan() calls uvm_loanzero() once for each page in the loaned
region (since the pages have not yet faulted in). This causes a
page to be allocated and zero'd. The result is the kernel spends
a lot of time allocating and zero'ing pages.

This fixes creates a special object which owns a single zero'd page.
This single zero'd page is used to satisfy all loans of non-resident
ZFOD mappings.

Thanks to Allen Briggs for discovering the problem and for providing
an initial patch.
 1.15 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.14 27-Jun-2000  mrg branches: 1.14.2; 1.14.4; 1.14.8;
remove include of <vm/vm.h>
 1.13 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.12 29-Mar-2000  simonb Don't need to include <sys/conf.h> here.
 1.11 25-Mar-1999  mrg branches: 1.11.8;
remove now >1 year old pre-release message.
 1.10 24-Jan-1999  chuck cleanup/reorg:
- break anon related functions out of uvm_amap.c and put them in their own
file (uvm_anon.c). includes break up uvm_anon_init into an amap and an
an anon init function
- ensure that only functions within the amap module access amap structure
fields (add macros to amap api as needed)
 1.9 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.8 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.7 05-May-1998  kleink branches: 1.7.2;
Remove inclusions of syscall (and syscall argument) related header files;
we don't need them here.
 1.6 09-Mar-1998  mrg KNF.
 1.5 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 07-Feb-1998  mrg restore rcsids
 1.3 07-Feb-1998  chs enable paging of kernel_object.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.7.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.11.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.8.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.14.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.14.2.3 17-Apr-2002  nathanw Catch up to -current.
 1.14.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.14.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.18.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.18.2.4 24-Jan-2005  skrll Sync with HEAD.
 1.18.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.18.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.18.2.1 03-Aug-2004  skrll Sync with HEAD
 1.20.4.1 29-Apr-2005  kent sync with -current
 1.23.2.6 04-Feb-2008  yamt sync with head.
 1.23.2.5 21-Jan-2008  yamt sync with head
 1.23.2.4 15-Nov-2007  yamt sync with head.
 1.23.2.3 03-Sep-2007  yamt sync with head.
 1.23.2.2 30-Dec-2006  yamt sync with head.
 1.23.2.1 21-Jun-2006  yamt sync with head.
 1.24.14.1 19-Jun-2006  chap Sync with head.
 1.24.8.3 15-Sep-2006  yamt make UVM_KICK_PDAEMON() a real function and stop including
uvm_pdpolicy.h from uvm.h. this also fixes build of pmap(1).
 1.24.8.2 26-Jun-2006  yamt sync with head.
 1.24.8.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.24.6.1 01-Jun-2006  kardel Sync with head.
 1.24.4.1 09-Sep-2006  rpaulo sync with head
 1.25.6.1 18-Nov-2006  ad Sync with head.
 1.26.12.1 11-Jul-2007  mjf Sync with head.
 1.26.10.5 21-Aug-2007  yamt fix some races around pagedaemon and uvm_wait. ok'ed by Andrew Doran.
 1.26.10.4 20-Aug-2007  ad Sync with HEAD.
 1.26.10.3 28-Apr-2007  ad Split uvm_hashlock into an array of 32 locks.
 1.26.10.2 05-Apr-2007  ad - Put a per-LWP lock around swapin / swapout.
- Replace use of lockmgr().
- Minor locking fixes and assertions.
- uvm_map.h no longer pulls in proc.h, etc.
- Use kpause where appropriate.
 1.26.10.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.27.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.27.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.28.6.2 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.28.6.1 21-Jul-2007  ad file uvm_init.c was added on branch matt-mips64 on 2007-07-21 19:21:55 +0000
 1.28.4.2 14-Nov-2007  joerg Sync with HEAD.
 1.28.4.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.29.8.2 18-Feb-2008  mjf Sync with HEAD.
 1.29.8.1 19-Nov-2007  mjf Sync with HEAD.
 1.29.6.1 18-Nov-2007  bouyer Sync with HEAD
 1.29.2.2 23-Mar-2008  matt sync with HEAD
 1.29.2.1 09-Jan-2008  matt sync with HEAD
 1.30.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.30.2.2 18-Dec-2007  ad Lock readahead context using the associated object's lock.
 1.30.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.32.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.32.10.3 11-Mar-2010  yamt sync with head
 1.32.10.2 18-Jul-2009  yamt sync with head.
 1.32.10.1 04-May-2009  yamt sync with head.
 1.32.8.1 17-Jun-2008  yamt sync with head.
 1.32.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.32.6.1 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.33.4.1 19-Oct-2008  haad Sync with HEAD.
 1.34.12.2 04-Apr-2012  matt Move the uvm_scheduler_mutex and cv init to uvm_init since they are
independent of VMSWAP.
 1.34.12.1 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.34.8.1 23-Jul-2009  jym Sync with HEAD.
 1.36.4.4 12-Jun-2011  rmind Do not call uvm_loan_init() twice.
 1.36.4.3 31-May-2011  rmind sync with head
 1.36.4.2 05-Mar-2011  rmind sync with head
 1.36.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.36.2.1 16-Nov-2010  uebayasi Sync with HEAD.
 1.38.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.38.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.41.8.1 18-Feb-2012  mrg merge to -current.
 1.41.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.41.4.2 17-Apr-2012  yamt sync with head
 1.41.4.1 20-Nov-2011  yamt - fix page loaning XXX make O->A loaning further
- add some statistics
 1.44.2.2 03-Dec-2017  jdolecek update from HEAD
 1.44.2.1 25-Feb-2013  tls resync with head
 1.45.14.2 05-Feb-2017  skrll Sync with HEAD
 1.45.14.1 06-Apr-2015  skrll Sync with HEAD
 1.45.12.1 23-Apr-2015  snj Pull up following revision(s) (requested by mrg in ticket #718):
sys/arch/x86/include/pmap.h: revision 1.56
sys/arch/x86/x86/pmap.c: revision 1.188
sys/dev/pci/agp_amd64.c: revision 1.8
sys/dev/pci/agp_i810.c: revision 1.118
sys/external/bsd/drm2/dist/drm/i915/i915_dma.c: revision 1.16
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: revision 1.29
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_agp.c: revision 1.3
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_ttm.c: revision 1.4
sys/external/bsd/drm2/dist/drm/radeon/atombios_crtc.c: revision 1.3
sys/external/bsd/drm2/dist/drm/radeon/radeon_agp.c: revision 1.3
sys/external/bsd/drm2/dist/drm/radeon/radeon_display.c: revision 1.3
sys/external/bsd/drm2/dist/drm/radeon/radeon_legacy_crtc.c: revision 1.2
sys/external/bsd/drm2/dist/drm/radeon/radeon_object.c: revision 1.3
sys/external/bsd/drm2/dist/drm/radeon/radeon_ttm.c: revision 1.7
sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c: revisions 1.7-1.10
sys/external/bsd/drm2/dist/drm/ttm/ttm_bo_util.c: revision 1.5
sys/external/bsd/drm2/i915drm/intelfb.c: revision 1.13
sys/external/bsd/drm2/include/drm/drm_wait_netbsd.h: revisions 1.12, 1.13
sys/external/bsd/drm2/include/linux/mm.h: revision 1.5
sys/external/bsd/drm2/include/linux/pci.h: revisions 1.16, 1.17
sys/external/bsd/drm2/nouveau/nouveaufb.c: revision 1.2
sys/external/bsd/drm2/radeon/radeon_pci.c: revisions 1.8, 1.9
sys/uvm/uvm_init.c: revision 1.46
Hack against the blank console problem:
Leave the CLUT alone on ancient cards. At least this leaves us with a
semi working console (red and blue are flipped). Leave an example of what
seems to be happening but disable it because colors are better than 444 bit
greyscale.
--
Initialize P->V tracking for unmanaged device pages in uvm_init.

Conditional on __HAVE_PMAP_PV_TRACK until we add it to all pmaps.

MI part of pmap_pv(9) change proposed on tech-kern:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html
--
Implement pmap_pv(9) for x86 for P->V tracking of unmanaged pages.

Proposed on tech-kern with no objections:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html
--
Use pmap_pv(9) to remove mappings of Intel graphics aperture pages.

Proposed on tech-kern with no objections:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html

Further background at:

https://mail-index.netbsd.org/tech-kern/2014/07/23/msg017392.html
--
Use pmap_pv(9) to remove mappings of device pages in TTM.

Adapt nouveau and radeon to do pmap_pv_track for their device pages.

Proposed on tech-kern with no objections:

https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html

Further background at:

https://mail-index.netbsd.org/tech-kern/2014/07/23/msg017392.html
--
Fix error branches in agp_amd64.c.

- agp_generic_detach always.
- Free asc if it was allocated. (Found by Brainy, noted by maxv@.)
- Free the GATT if it was allocated.
--
pmf_device_register returns false on failure, not true
--
In DRM_SPIN_WAIT_ON, don't stop after waiting only one tick.

Continue the loop to recheck the condition and count the whole
duration.
--
Don't use the video BIOS memory as an i915 flush page!
--
Don't let anyone else allocate the video BIOS either.
--
Missed a zero: it's 0x100000, not 0x10000.
--
Don't reserve if atomic -- caller must have pre-pinned the buffer.
--
Don't reserve if atomic -- caller must have pre-pinned the buffer.
--
almost add radeondrmkms suspend/resume support. it unfortunately doesn't work.
--
Need the page's uvm object lock to do pmap_page_protect.
--
Use KASSERTMSG to show bad base/offset.
--
KASSERT about page-alignment on initialization too.
--
Don't break when hardclock_ticks wraps around.

Since we now only count time spent in wait, rather than determining
the end time and checking whether we've passed it, timeouts might be
marginally longer in effect. Unlikely to be an issue.
--
Remove broken drm2 vm_mmap stub. Can't possibly have ever worked.
--
apply some of the additional changes from Arto Huusko in PR#49645:
- call pmf_device_deregister on detach.

i've kept the "resume = true" for radeon_resume_kms() call as it
seems to work for me (indeed, code inspection shows it is unused
on netbsd :-)

my old nforce4 box that can resume old drm (or could, last i tried
several years ago) while X and GL apps were running, can at least
survive a resume if X hasn't started. my one attempt so far with
X exited, but having run, did not work.
--
First attempt to make ttm_buffer_object_transfer less bogus.
--
Make sure mem.bus.is_iomem is initialized. PR 49833
 1.46.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.48.14.1 21-May-2018  pgoyette Sync with HEAD
 1.49.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.54.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.30 03-May-2024  skrll KNF
 1.29 21-Sep-2020  chs the previous fix for PR 55366 in uvm_amap.c 1.124 was incomplete:
- amap_adjref_anons() must also ignore AMAP_REFALL when updating
the ppref, not just when deciding whether or not to initialize ppref.
- UVM_EXTRACT_QREF relies on AMAP_REFALL to work properly,
and since we can't use AMAP_REFALL then we can't use QREF either.
 1.28 25-May-2016  christos branches: 1.28.22;
Introduce security.pax.mprotect.ptrace sysctl which can be used to bypass
mprotect settings so that debuggers can write to the text segment of traced
processes so that they can insert breakpoints. Turned off by default.
Ok: chuq (for now)
 1.27 27-Jan-2012  para branches: 1.27.6; 1.27.24;
extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.26 23-Apr-2011  rmind branches: 1.26.4; 1.26.8;
Replace "malloc" in comments, remove unnecessary header inclusions.
 1.25 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.24 04-Mar-2007  christos branches: 1.24.64; 1.24.70; 1.24.72;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.23 20-Dec-2005  skrll branches: 1.23.26;
Whitespace
 1.22 06-Dec-2005  chs Avoid leaking memory if uiomove fails. from openbsd via PR 32251.
 1.21 27-Jun-2005  thorpej branches: 1.21.2;
Use ANSI function decls.
 1.20 01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.19 01-Jan-2005  yamt branches: 1.19.2; 1.19.4;
for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.
 1.18 23-Apr-2003  tls branches: 1.18.2;
Correct use of MAXBSIZE where MAXPHYS was intended. This is a necessary
first step towards per-device MAXPHYS, and has the beneficial side effect
of allowing clustering to MAXPHYS even on systems that need to run with
a reduced MAXBSIZE to get more metadata buffers.
 1.17 10-Nov-2001  lukem branches: 1.17.10;
add RCSIDs, and in some cases, slightly cleanup #include order
 1.16 15-Sep-2001  chs branches: 1.16.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.15 02-Jun-2001  chs branches: 1.15.2; 1.15.4;
replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.14 25-May-2001  chs remove trailing whitespace.
 1.13 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.12 27-Jun-2000  mrg branches: 1.12.2;
remove include of <vm/vm.h>
 1.11 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.10 02-Jun-2000  pk Let uvm_map_extract() set the lower bound on the kernel address range
itself, in stead of having its callers do that.
 1.9 02-Jun-2000  pk Shouldn't pass garbage to uvm_map_extract().
 1.8 25-Mar-1999  mrg branches: 1.8.8; 1.8.16;
remove now >1 year old pre-release message.
 1.7 11-Oct-1998  chuck remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks
 1.6 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.5 05-May-1998  kleink branches: 1.5.2;
Remove inclusions of syscall (and syscall argument) related header files;
we don't need them here.
 1.4 09-Mar-1998  mrg KNF.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.5.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.8.16.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.8.8.2 27-Mar-2001  bouyer Sync with HEAD.
 1.8.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.12.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.12.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.12.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.15.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.15.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.16.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.17.10.1 26-Aug-2003  tron Pull up revision 1.18 (requested by tls in ticket #1434):
Correct use of MAXBSIZE where MAXPHYS was intended. This is a necessary
first step towards per-device MAXPHYS, and has the beneficial side effect
of allowing clustering to MAXPHYS even on systems that need to run with
a reduced MAXBSIZE to get more metadata buffers.
 1.18.2.4 11-Dec-2005  christos Sync with head.
 1.18.2.3 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.18.2.2 01-Apr-2005  skrll Sync with HEAD.
 1.18.2.1 17-Jan-2005  skrll Sync with HEAD.
 1.19.4.1 25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.19.2.1 29-Apr-2005  kent sync with -current
 1.21.2.2 03-Sep-2007  yamt sync with head.
 1.21.2.1 21-Jun-2006  yamt sync with head.
 1.23.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.24.72.1 08-Feb-2011  bouyer Sync with HEAD
 1.24.70.1 06-Jun-2011  jruoho Sync with HEAD.
 1.24.64.2 31-May-2011  rmind sync with head
 1.24.64.1 05-Mar-2011  rmind sync with head
 1.26.8.1 18-Feb-2012  mrg merge to -current.
 1.26.4.1 17-Apr-2012  yamt sync with head
 1.27.24.1 29-May-2016  skrll Sync with HEAD
 1.27.6.2 03-Dec-2017  jdolecek update from HEAD
 1.27.6.1 12-Sep-2012  tls Initial snapshot of work to eliminate 64K MAXPHYS. Basically works for
physio (I/O to raw devices); needs more doing to get it going with the
filesystems, but it shouldn't damage data.

All work's been done on amd64 so far. Not hard to add support to other
ports. If others want to pitch in, one very helpful thing would be to
sort out when and how IDE disks can do 128K or larger transfers, and
adjust the various PCI IDE (or at least ahcisata) drivers and wd.c
accordingly -- it would make testing much easier. Another very helpful
thing would be to implement a smart minphys() for RAIDframe along the
lines detailed in the MAXPHYS-NOTES file.
 1.28.22.1 04-Oct-2020  martin Pull up following revision(s) (requested by chs in ticket #1095):

sys/uvm/uvm_amap.c: revision 1.124 (via patch)
sys/uvm/uvm_amap.c: revision 1.125 (via patch)
sys/uvm/uvm_io.c: revision 1.29 (via patch)

Effectively disable the AMAP_REFALL flag because it is unsafe.

This flag tells the amap code that it does not need to allocate ppref
as part of adding or removing a reference, but that is only correct
if the range of the reference being added or removed is the same
as the range of all other references to the amap, and the point of
this flag is exactly to try to optimize the case where the range is
different and thus this flag would not be correct to use.
Fixes PR 55366.

The previous fix for PR 55366 in uvm_amap.c 1.124 was incomplete:
- amap_adjref_anons() must also ignore AMAP_REFALL when updating
the ppref, not just when deciding whether or not to initialize ppref.
- UVM_EXTRACT_QREF relies on AMAP_REFALL to work properly,
and since we can't use AMAP_REFALL then we can't use QREF either.
 1.166 07-Dec-2024  chs kmem: improve behavior when using all of physical memory as kmem

On systems where kmem does not need to be limited by kernel virtual
space (essentially 64-bit platforms), we currently try to size the
"kmem" space to be big enough for all of physical memory to be
allocated as kmem, which really means that we will always run short of
physical memory before we run out of kernel virtual space. However
this does not take into account that uvm_km_va_starved_p() starts
reporting that we are low on kmem virtual space when we have used 90%
of it, in an attempt to avoid kmem space becoming too fragmented,
which means on large memory systems we will still start reacting to
being short of virtual space when there is plenty of physical memory
still available. Fix this by overallocating the kmem space by a
factor of 10/9 so that we always run low on physical memory first,
as we want.
 1.165 09-Apr-2023  riastradh branches: 1.165.6;
uvm(9): KASSERT(A && B) -> KASSERT(A); KASSERT(B)
 1.164 26-Feb-2023  skrll nkmempages should be size_t
 1.163 12-Feb-2023  andvar s/strucure/structure/ and s/structues/structures/ in comments.
 1.162 06-Aug-2022  chs branches: 1.162.4;
allow KMSAN to work again by restoring the limiting of kva even with
NKMEMPAGES_MAX_UNLIMITED. we used to limit kva to 1/8 of physmem
but limiting to 1/4 should be enough, and 1/4 still gives the kernel
enough kva to map all of the RAM that KMSAN has not stolen.

Reported-by: syzbot+ca3710b4c40cdd61aa72@syzkaller.appspotmail.com
 1.161 03-Aug-2022  chs for platforms which define NKMEMPAGES_MAX_UNLIMITED, set nkmempages
high enough to allow the kernel to map all of RAM into kmem,
so that free physical pages rather than kernel virtual space is
the limiting factor in allocating kernel memory. this gives ZFS
more flexibility in tuning how much memory to use for its ARC cache.
 1.160 13-Mar-2021  skrll Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats
 1.159 09-Jul-2020  skrll branches: 1.159.2;
Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
 1.158 08-Jul-2020  skrll Trailing whitespace
 1.157 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.156 24-Feb-2020  rin 0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.
 1.155 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.154 08-Feb-2020  maxv Retire KLEAK.

KLEAK was a nice feature and served its purpose; it allowed us to detect
dozens of info leaks on the kernel->userland boundary, and thanks to it we
tackled a good part of the infoleak problem 1.5 years ago.

Nowadays however, we have kMSan, which can detect uninitialized memory in
the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to
detect, but in addition, (1) it operates in all of the kernel and not just
the kernel->userland boundary, (2) it requires no user interaction, and (3)
it is deterministic and not statistical.

That makes kMSan the feature of choice to detect info leaks nowadays;
people interested in detecting info leaks should boot a kMSan kernel and
just wait for the magic to happen.

KLEAK was a good ride, and a fun project, but now is time for it to go.

Discussed with several people, including Thomas Barabosch.
 1.153 20-Jan-2020  skrll Another #define protection.

PMAP_ALLOC_POOLPAGE expects PMAP_{,UN}MAP_POOLPAGE to be defined
 1.152 14-Dec-2019  ad branches: 1.152.2;
Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.
 1.151 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.150 01-Dec-2019  uwe Add missing #include <sys/atomic.h>
 1.149 01-Dec-2019  ad Minor correction to previous.
 1.148 01-Dec-2019  ad - Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.
 1.147 14-Nov-2019  maxv Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized
memory used by the kernel at run time, and just like kASan and kCSan, it
is an excellent feature. It has already detected 38 uninitialized variables
in the kernel during my testing, which I have since discreetly fixed.

We use two shadows:
- "shad", to track uninitialized memory with a bit granularity (1:1).
Each bit set to 1 in the shad corresponds to one uninitialized bit of
real kernel memory.
- "orig", to track the origin of the memory with a 4-byte granularity
(1:1). Each uint32_t cell in the orig indicates the origin of the
associated uint32_t of real kernel memory.

The memory consumption of these shadows is consequent, so at least 4GB of
RAM is recommended to run kMSan.

The compiler inserts calls to specific __msan_* functions on each memory
access, to manage both the shad and the orig and detect uninitialized
memory accesses that change the execution flow (like an "if" on an
uninitialized variable).

We mark as uninit several types of memory buffers (stack, pools, kmem,
malloc, uvm_km), and check each buffer passed to copyout, copyoutstr,
bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory
that leaves the system. This allows us to detect kernel info leaks in a way
that is more efficient and also more user-friendly than KLEAK.

Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot
tolerate having one non-instrumented function, because this could cause
false positives. kMSan cannot instrument ASM functions, so I converted
most of them to __asm__ inlines, which kMSan is able to instrument. Those
that remain receive special treatment.

Contrary to kASan again, kMSan uses a TLS, so we must context-switch this
TLS during interrupts. We use different contexts depending on the interrupt
level.

The orig tracks precisely the origin of a buffer. We use a special encoding
for the orig values, and pack together in each uint32_t cell of the orig:
- a code designating the type of memory (Stack, Pool, etc), and
- a compressed pointer, which points either (1) to a string containing
the name of the variable associated with the cell, or (2) to an area
in the kernel .text section which we resolve to a symbol name + offset.

This encoding allows us not to consume extra memory for associating
information with each cell, and produces a precise output, that can tell
for example the name of an uninitialized variable on the stack, the
function in which it was pushed on the stack, and the function where we
accessed this uninitialized variable.

kMSan is available with LLVM, but not with GCC.

The code is organized in a way that is similar to kASan and kCSan, so it
means that other architectures than amd64 can be supported.
 1.146 02-Dec-2018  maxv Introduce KLEAK, a new feature that can detect kernel information leaks.

It works by tainting memory sources with marker values, letting the data
travel through the kernel, and scanning the kernel<->user frontier for
these marker values. Combined with compiler instrumentation and rotation
of the markers, it is able to yield relevant results with little effort.

We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is
supported on amd64 only for now, but it is not complicated to add more
architectures (just a matter of having the address of .text, and a stack
unwinder).

A userland tool is provided, that allows to execute a command in rounds
and monitor the leaks generated all the while.

KLEAK already detected directly 12 kernel info leaks, and prompted changes
that in total fixed 25+ leaks.

Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer
FKIE).
 1.145 04-Nov-2018  mlelstv PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.
 1.144 28-Oct-2017  pgoyette branches: 1.144.2; 1.144.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.143 01-Jun-2017  chs branches: 1.143.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.142 19-Mar-2017  riastradh __diagused police
 1.141 27-Jul-2016  maxv branches: 1.141.2;
Use UVM_PROT_ALL only if UVM_KMF_EXEC is given as argument. Otherwise, if
UVM_KMF_PAGEABLE is also given as argument, only the VA is allocated and
UVM waits for the page to fault before kentering it. When kentering it, it
will use the UVM_PROT_ flag that was passed to uvm_map; which means that it
will kenter it as RWX.

With this change, the number of RWX pages in the amd64 kernel reaches
strictly zero.
 1.140 20-Jul-2016  maxv Introduce uvm_km_protect.
 1.139 06-Feb-2015  maxv branches: 1.139.2;
Kill kmeminit().
 1.138 29-Jan-2013  para branches: 1.138.12; 1.138.14;
bring file up to date for previous vmem changes.
 1.137 26-Jan-2013  para revert previous commit not yet fully functional, sorry
 1.136 26-Jan-2013  para make vmem(9) ready to be used early during bootstrap to replace extent(9).
pass memory for vmem structs into the initialization functions and
do away with the static pools for this.
factor out the vmem internal structures into a private header.
remove special bootstrapping of the kmem_va_arena as all necessary memory
comes from pool_allocator_meta wich is fully operational at this point.
 1.135 07-Sep-2012  para branches: 1.135.2;
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc

for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier

while here guard uvm_km_va_starved_p from kmem_arena not yet created

thanks for tracking this down to everyone involved
 1.134 04-Sep-2012  matt Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.
 1.133 03-Sep-2012  matt Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.
 1.132 03-Sep-2012  matt Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.
 1.131 03-Sep-2012  matt Shut up gcc printf warning.
 1.130 03-Sep-2012  matt Don't try grow the entire kmem space but just do as needed in uvm_km_kmem_alloc
 1.129 03-Sep-2012  matt Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.
 1.128 09-Jul-2012  matt Convert a KASSERT to a KASSERTMSG. Expand one KASSERTSG a little bit.
 1.127 03-Jun-2012  rmind Improve the wording slightly.
 1.126 02-Jun-2012  para add some description about the vmem arenas, how they stack up and their purpose
 1.125 13-Apr-2012  yamt uvm_km_kmem_alloc: don't hardcode kmem_va_arena
 1.124 12-Mar-2012  bouyer uvm_km_pgremove_intrsafe(): properly compute the size to pmap_kremove()
(do not trucate it to the first __PGRM_BATCH pages per batch): if we were
given a sparse mapping, we could leave mappings in place.
Note that this doesn't seem to be a problem right now: I added a KASSERT
in my private tree to see if uvm_km_pgremove_intrsafe() would use a
too short size, and it didn't fire.
 1.123 25-Feb-2012  rmind uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.
 1.122 20-Feb-2012  bouyer When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
 1.121 19-Feb-2012  rmind Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".
 1.120 10-Feb-2012  para branches: 1.120.2;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system
 1.119 04-Feb-2012  para improve sizing of kmem_arena now that more allocations are made from it
don't enforce limits if not required

ok: riz@
 1.118 03-Feb-2012  matt Always allocate the kmem region. Add UVMHIST support. Approved by releng.
 1.117 02-Feb-2012  para - bringing kmeminit_nkmempages back and revert pmaps that called this early
- use nkmempages to scale the kmem_arena
- reducing diff to pre kmem/vmem change
(NKMEMPAGES_MAX_DEFAULT will need adjusting on some archs)
 1.116 01-Feb-2012  para allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space
 1.115 01-Feb-2012  matt Use right UVM_xxx_COLORMATCH flag (even both use the same value).
 1.114 31-Jan-2012  matt Deal with case when kmembase == kmemstart.
Use KASSERTMSG for a few KASSERTs
Make sure to match the color of the VA when we are allocating a physical page.
 1.113 29-Jan-2012  para size kmem_arena more sanely for small memory machines
 1.112 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.111 01-Sep-2011  matt branches: 1.111.2; 1.111.6;
Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.
 1.110 05-Jul-2011  yamt - fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.
 1.109 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.108 02-Feb-2011  chuck branches: 1.108.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.
 1.107 04-Jan-2011  matt branches: 1.107.2; 1.107.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.
 1.106 14-May-2010  cegger Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@
 1.105 08-Feb-2010  joerg branches: 1.105.2;
Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.
 1.104 07-Nov-2009  cegger branches: 1.104.2;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.103 13-Dec-2008  ad It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap
 1.102 01-Dec-2008  ad PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.
 1.101 04-Aug-2008  pooka branches: 1.101.2; 1.101.4;
the most karmic commit of all: fix tyop in comment
 1.100 16-Jul-2008  matt Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.
 1.99 24-Mar-2008  yamt branches: 1.99.4; 1.99.6; 1.99.8; 1.99.10;
remove a redundant pmap_update and add a comment instead.
 1.98 23-Feb-2008  chris Add some more missing pmap_update()s following pmap_kremove()s.
 1.97 02-Jan-2008  ad branches: 1.97.2; 1.97.6;
Merge vmlocking2 to head.
 1.96 21-Jul-2007  ad branches: 1.96.6; 1.96.12; 1.96.14; 1.96.16; 1.96.18; 1.96.22;
Fix DEBUG build.
 1.95 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.94 12-Mar-2007  ad branches: 1.94.8;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.93 21-Feb-2007  thorpej branches: 1.93.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.92 01-Nov-2006  yamt branches: 1.92.4;
remove some __unused from function parameters.
 1.91 12-Oct-2006  uwe More __unused (in cpp conditionals not touched by i386).
 1.90 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.89 05-Jul-2006  drochner branches: 1.89.4; 1.89.6;
Introduce a UVM_KMF_EXEC flag for uvm_km_alloc() which enforces an
executable mapping. Up to now, only R+W was requested from pmap_kenter_pa.
On most CPUs, we get an executable mapping anyway, due to lack of
hardware support or due to lazyness in the pmap implementation. Only
alpha does obey VM_PROT_EXECUTE, afaics.
 1.88 25-May-2006  yamt branches: 1.88.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html
 1.87 03-May-2006  yamt branches: 1.87.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).
 1.86 05-Apr-2006  yamt uvm_km_pgremove/uvm_km_pgremove_intrsafe: fix assertions.
 1.85 17-Mar-2006  yamt uvm_km_check_empty: fix an assertion.
 1.84 11-Dec-2005  christos branches: 1.84.4; 1.84.6; 1.84.8; 1.84.10; 1.84.12;
merge ktrace-lwp.
 1.83 27-Jun-2005  thorpej branches: 1.83.2;
Use ANSI function decls.
 1.82 29-May-2005  christos avoid shadow variables.
remove unneeded casts.
 1.81 20-Apr-2005  simonb Use a cast to (long long) and 0x%llx to print out a paddr_t instead
of casting to (void *). Fixes compile problems with 64-bit paddr_t
on 32-bit platforms.
 1.80 12-Apr-2005  yamt fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.
 1.79 01-Apr-2005  yamt unwrap short lines.
 1.78 01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.77 26-Feb-2005  perry branches: 1.77.2;
nuke trailing whitespace
 1.76 13-Jan-2005  yamt branches: 1.76.2; 1.76.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.
 1.75 12-Jan-2005  yamt don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.
 1.74 05-Jan-2005  yamt km_vacache_alloc: UVM_PROT_ALL rather than UVM_PROT_NONE
so that uvm_kernacc works. PR/28861. (FUKAUMI Naoki)
 1.73 03-Jan-2005  yamt km_vacache_alloc: specify va hint correctly rather than
using stack garbage. PR/28845.
 1.72 01-Jan-2005  yamt in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.
 1.71 01-Jan-2005  yamt introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.
 1.70 01-Jan-2005  yamt for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.
 1.69 24-Mar-2004  junyoung Drop trailing spaces.
 1.68 10-Feb-2004  matt Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.
 1.67 29-Jan-2004  yamt - split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.
 1.66 18-Dec-2003  pk * Introduce uvm_km_kmemalloc1() which allows alignment and preferred offset
to be passed to uvm_map().

* Turn all uvm_km_valloc*() macros back into (inlined) functions to retain
binary compatibility with any 3rd party modules.
 1.65 18-Dec-2003  pk Condense all existing variants of uvm_km_valloc into a single function:
uvm_km_valloc1(), and use it to express all of
uvm_km_valloc()
uvm_km_valloc_wait()
uvm_km_valloc_prefer()
uvm_km_valloc_prefer_wait()
uvm_km_valloc_align()
in terms of it by macro expansion.
 1.64 28-Aug-2003  pk When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.
 1.63 11-Aug-2003  pk Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.
 1.62 10-May-2003  thorpej branches: 1.62.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.
 1.61 08-May-2003  thorpej Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).
 1.60 30-Nov-2002  bouyer Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.
 1.59 05-Oct-2002  oster Garbage collect some leftover (and unneeded) code. OK'ed by chs.
 1.58 15-Sep-2002  chs add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.
 1.57 14-Aug-2002  thorpej Don't pass VM_PROT_EXEC to pmap_kenter_pa().
 1.56 07-Mar-2002  thorpej branches: 1.56.2; 1.56.6; 1.56.8;
If the bootstrapping process didn't actually use any KVA space, don't
reserve size of 0 in kernel_map.

From OpenBSD.
 1.55 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.54 07-Nov-2001  chs only acquire the lock for swpgonly if we actually need to adjust it.
 1.53 06-Nov-2001  chs several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.
 1.52 15-Sep-2001  chs branches: 1.52.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.51 10-Sep-2001  chris Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.
 1.50 26-Jun-2001  thorpej branches: 1.50.2; 1.50.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.
 1.49 02-Jun-2001  chs replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.48 26-May-2001  chs replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.
 1.47 25-May-2001  chs remove trailing whitespace.
 1.46 24-Apr-2001  thorpej Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.
 1.45 12-Apr-2001  thorpej Add a __predict_true() to an extremely common case.
 1.44 12-Apr-2001  thorpej In uvm_km_kmemalloc(), use the correct size for the uvm_unmap()
call if the allocation fails.

Problem pointed out by Alfred Perlstein <bright@wintelcom.net>,
who found a similar bug in FreeBSD.
 1.43 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.42 14-Jan-2001  thorpej branches: 1.42.2;
splimp() -> splvm()
 1.41 27-Nov-2000  nisimura Introduce uvm_km_valloc_align() and use it to glab process's USPACE
aligned on USPACE boundary in kernel virutal address. It's benefitial
for MIPS R4000's paired TLB entry design.
 1.40 24-Nov-2000  chs cleanup: use queue.h macros and KASSERT().
 1.39 13-Sep-2000  thorpej Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.
 1.38 24-Jul-2000  jeffs Add uvm_km_valloc_prefer_wait(). Used to valloc with the passed in
voff_t being passed to PMAP_PREFER(), which results in the propper
virtual alignment of the allocated space.
 1.37 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.36 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.35 08-May-2000  thorpej branches: 1.35.4;
__predict_false() out-of-resource conditions and DIAGNOSTIC error checks.
 1.34 11-Jan-2000  chs add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.33 13-Nov-1999  thorpej Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.
 1.32 12-Sep-1999  chs branches: 1.32.2; 1.32.4; 1.32.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.
 1.31 22-Jul-1999  thorpej Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.30 22-Jul-1999  thorpej 0 -> FALSE in a few places.
 1.29 18-Jul-1999  chs allow uvm_km_alloc_poolpage1() to use kernel-reserve pages.
 1.28 17-Jul-1999  thorpej Garbage-collect uvm_km_get(); nothing actually uses it.
 1.27 04-Jun-1999  thorpej Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.
 1.26 26-May-1999  thorpej Wired kernel mappings are wired; pass VM_PROT_READ|VM_PROT_WRITE for
access_type to pmap_enter() to ensure that when these mappings are accessed,
possibly in interrupt context, that they won't cause mod/ref emulation
page faults.
 1.25 26-May-1999  thorpej Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.
 1.24 25-May-1999  thorpej Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.
 1.23 11-Apr-1999  chs add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.22 26-Mar-1999  mycroft branches: 1.22.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().
 1.21 26-Mar-1999  chs add uvmexp.swpgonly and use it to detect out-of-swap conditions.
 1.20 25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.19 24-Mar-1999  cgd after discussion with chuck, nuke pgo_attach from uvm_pagerops
 1.18 18-Oct-1998  chs branches: 1.18.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.
 1.17 11-Oct-1998  chuck remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks
 1.16 28-Aug-1998  thorpej Add a couple of comments about how the pool page allocator functions
can be called with a map that doens't require spl protection.
 1.15 28-Aug-1998  thorpej Add a waitok boolean argument to the VM system's pool page allocator backend.
 1.14 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.13 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.12 01-Aug-1998  thorpej We need to be able to specify a uvm_object to the pool page allocator, too.
 1.11 31-Jul-1998  thorpej Allow an alternate splimp-protected map to be specified in the pool page
allocator routines.
 1.10 24-Jul-1998  thorpej branches: 1.10.2;
Implement uvm_km_{alloc,free}_poolpage(). These functions use pmap hooks to
map/unmap pool pages if provided by the pmap layer.
 1.9 09-Jun-1998  chs correct counting for uvmexp.wired:
only pages explicitly wired by a user process should be counted.
 1.8 09-Mar-1998  mrg KNF.
 1.7 24-Feb-1998  chuck be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail
 1.6 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.5 08-Feb-1998  thorpej Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.
 1.4 07-Feb-1998  mrg restore rcsids
 1.3 07-Feb-1998  chs convert kernel_object to an aobj.
in uvm_km_pgremove(), free swapslots if the object is an aobj.
in uvm_km_kmemalloc(), mark pages as wired and count them.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.10.2.2 08-Aug-1998  eeh Revert cdevsw mmap routines to return int.
 1.10.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.18.2.2 25-Feb-1999  chs thread_wakeup() -> wakeup().
 1.18.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.22.2.1 16-Apr-1999  chs branches: 1.22.2.1.2;
pull up 1.22 -> 1.23:
add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.22.2.1.2.3 02-Aug-1999  thorpej Update from trunk.
 1.22.2.1.2.2 21-Jun-1999  thorpej Sync w/ -current.
 1.22.2.1.2.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.32.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.32.4.1 15-Nov-1999  fvdl Sync with -current
 1.32.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.32.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.32.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.32.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.32.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.35.4.1 23-Apr-2001  he Pull up revision 1.44 (requested by thorpej):
Use correct size for uvm_unmap() in error case of uvm_km_kmemalloc().
 1.42.2.10 11-Dec-2002  thorpej Sync with HEAD.
 1.42.2.9 18-Oct-2002  nathanw Catch up to -current.
 1.42.2.8 17-Sep-2002  nathanw Catch up to -current.
 1.42.2.7 27-Aug-2002  nathanw Catch up to -current.
 1.42.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.42.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.42.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.42.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.42.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.42.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.50.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.50.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.50.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.50.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.50.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.50.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.52.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.56.8.2 02-Jun-2003  tron Pull up revision 1.58 (requested by skrll):
add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.
 1.56.8.1 18-Nov-2002  he Pull up revision 1.57 (requested by thorpej in ticket #675):
Don't pass VM_PROT_EXEC to pmap_kenter_pa().
 1.56.6.1 29-Aug-2002  gehenna catch up with -current.
 1.56.2.1 11-Mar-2002  thorpej Convert swap_syscall_lock and uvm.swap_data_lock to adaptive mutexes,
and rename them apporpriately.
 1.62.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.62.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.62.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.62.2.4 17-Jan-2005  skrll Sync with HEAD.
 1.62.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.62.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.62.2.1 03-Aug-2004  skrll Sync with HEAD
 1.76.4.7 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.76.4.6 18-Feb-2005  chs move a UVMHIST_LOG to avoid an uninitialized variable.
add more info to a debug message.
 1.76.4.5 18-Feb-2005  yamt whitespace, comments, panic messages. no functional changes.
 1.76.4.4 16-Feb-2005  yamt remove redundant trunc_page/round_page.
 1.76.4.3 31-Jan-2005  yamt uvm_km_free: uvm_km_pgremove_intrsafe and pmap_kremove only when UVM_KMF_WIRED.
 1.76.4.2 25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.76.4.1 25-Jan-2005  yamt remove some compatibility functions.
 1.76.2.1 29-Apr-2005  kent sync with -current
 1.77.2.1 06-Dec-2005  riz Apply patch (requested by yamt in ticket #1015):
sys/uvm/uvm_glue.c: patch
sys/uvm/uvm_km.c: patch
- correct a return value of uvm_km_valloc1 in the case of failure.
- do waitok allocation for uvm_uarea_alloc so that it won't fail on
temporary memory shortage.
 1.83.2.7 24-Mar-2008  yamt sync with head.
 1.83.2.6 27-Feb-2008  yamt sync with head.
 1.83.2.5 21-Jan-2008  yamt sync with head
 1.83.2.4 03-Sep-2007  yamt sync with head.
 1.83.2.3 26-Feb-2007  yamt sync with head.
 1.83.2.2 30-Dec-2006  yamt sync with head.
 1.83.2.1 21-Jun-2006  yamt sync with head.
 1.84.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.84.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.84.10.2 11-May-2006  elad sync with head
 1.84.10.1 19-Apr-2006  elad oops - *really* sync to head this time.
 1.84.8.5 11-Aug-2006  yamt sync with head
 1.84.8.4 26-Jun-2006  yamt sync with head.
 1.84.8.3 24-May-2006  yamt sync with head.
 1.84.8.2 11-Apr-2006  yamt sync with head
 1.84.8.1 01-Apr-2006  yamt sync with head.
 1.84.6.2 01-Jun-2006  kardel Sync with head.
 1.84.6.1 22-Apr-2006  simonb Sync with head.
 1.84.4.1 09-Sep-2006  rpaulo sync with head
 1.87.2.1 19-Jun-2006  chap Sync with head.
 1.88.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.89.6.2 10-Dec-2006  yamt sync with head.
 1.89.6.1 22-Oct-2006  yamt sync with head
 1.89.4.1 18-Nov-2006  ad Sync with head.
 1.92.4.2 24-Mar-2007  yamt sync with head.
 1.92.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.93.4.5 18-Sep-2007  ad Undo previous. Other threads can allocate with the map locked, which could
cause PR_NOWAIT allocations to wait long term.
 1.93.4.4 18-Sep-2007  ad Don't use UVM_KMF_TRYLOCK for pool allocations when PR_NOWAIT is specified.
 1.93.4.3 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.93.4.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.93.4.1 13-Mar-2007  ad Sync with head.
 1.94.8.1 15-Aug-2007  skrll Sync with HEAD.
 1.96.22.2 21-Jul-2007  ad Fix DEBUG build.
 1.96.22.1 21-Jul-2007  ad file uvm_km.c was added on branch matt-mips64 on 2007-07-21 20:53:00 +0000
 1.96.18.1 02-Jan-2008  bouyer Sync with HEAD
 1.96.16.1 10-Dec-2007  yamt - separate kernel va allocation (kernel_va_arena) from
in-kernel fault handling (kernel_map).
- add vmem bootstrap code. vmem doesn't rely on malloc anymore.
- make kmem_alloc interrupt-safe.
- kill kmem_map. make malloc a wrapper of kmem_alloc.
 1.96.14.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.96.12.1 18-Feb-2008  mjf Sync with HEAD.
 1.96.6.2 23-Mar-2008  matt sync with HEAD
 1.96.6.1 09-Jan-2008  matt sync with HEAD
 1.97.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.97.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.97.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.97.2.1 24-Mar-2008  keiichi sync with head.
 1.99.10.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.99.10.1 19-Oct-2008  haad Sync with HEAD.
 1.99.8.1 18-Jul-2008  simonb Sync with head.
 1.99.6.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.99.4.3 11-Aug-2010  yamt sync with head.
 1.99.4.2 11-Mar-2010  yamt sync with head
 1.99.4.1 04-May-2009  yamt sync with head.
 1.101.4.2 19-Apr-2009  snj branches: 1.101.4.2.4;
Pull up following revision(s) (requested by mrg in ticket #708):
sys/uvm/uvm_km.c: revision 1.102
sys/uvm/uvm_km.h: revision 1.18
sys/uvm/uvm_map.c: revision 1.264
PR port-amd64/32816 amd64 can not load lkms
Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.
 1.101.4.1 27-Dec-2008  snj Pull up following revision(s) (requested by bouyer in ticket #211):
sys/uvm/uvm_km.c: revision 1.103
sys/uvm/uvm_map.c: revision 1.265
sys/uvm/uvm_page.c: revision 1.141
It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:
- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap
 1.101.4.2.4.11 12-Apr-2012  matt Apply colormask to get a valid color.
 1.101.4.2.4.10 12-Apr-2012  matt Separate object-less anon pages out of the active list if there is no swap
device. Make uvm_reclaimable and uvm.*estimatable understand colors and
kmem allocations.
 1.101.4.2.4.9 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.101.4.2.4.8 14-Feb-2012  matt Add more KASSERTs (more! more! more!).
When returning page to the free pool, make sure to dequeue the pages before
hand or free page queue corruption will happen.
 1.101.4.2.4.7 10-Feb-2012  matt Place allocated kmem pages on a kmem_pageq. This makes it easy for crash
dump code to find them.
 1.101.4.2.4.6 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.101.4.2.4.5 03-Jun-2011  matt Restore $NetBSD$
 1.101.4.2.4.4 25-May-2011  matt Make uvm_map recognize UVM_FLAG_COLORMATCH which tells uvm_map that the
'align' argument specifies the starting color of the KVA range to be returned.

When calling uvm_km_alloc with UVM_KMF_VAONLY, also specify the starting
color of the kva range returned (UMV_KMF_COLORMATCH) and pass those to
uvm_map.

In uvm_pglistalloc, make sure the pages being returned have sequentially
advancing colors (so they can be mapped in a contiguous address range).
Add a few missing UVM_FLAG_COLORMATCH flags to uvm_pagealloc calls.

Make the socket and pipe loan color-safe.

Make the mips pmap enforce strict page color (color(VA) == color(PA)).
 1.101.4.2.4.3 06-Feb-2010  matt Allow uvm_km_alloc to allocate from a specific vm freelist if the port wants
it to.
 1.101.4.2.4.2 26-Jan-2010  matt Pass hints to uvm_pagealloc* to get it to use the right page color rather
than guess the right page color.
 1.101.4.2.4.1 09-Jan-2010  matt If PMAP_ALLOC_POOLPAGE is defined use it instead of uvm_pagealloc
 1.101.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.104.2.8 17-Aug-2010  uebayasi Sync with HEAD.
 1.104.2.7 08-Jul-2010  uebayasi Clean up.
 1.104.2.6 07-Jul-2010  uebayasi Clean up; merge options DIRECT_PAGE into options XIP.
 1.104.2.5 06-Jul-2010  uebayasi Directly allocate zero'ed vm_page for XIP unallocated blocks, instead
of abusing pool page. Move the code to XIP vnode pager in genfs_io.c.
 1.104.2.4 31-May-2010  uebayasi Re-define the definition of "device page"; device pages are pages of
device memory. Pages which don't have vm_page (== can't be used for
generic use), but whose PV are tracked, are called "direct pages" from
now.
 1.104.2.3 30-Apr-2010  uebayasi Sync with HEAD.
 1.104.2.2 23-Feb-2010  uebayasi Don't forget opt_device_page.h.
 1.104.2.1 10-Feb-2010  uebayasi Initial attempt to implement uvm_pageofzero_xip(), which returns a pointer
to a single read-only zeroed page. This is meant to be used for XIP now.
Only compile tested.
 1.105.2.5 05-Mar-2011  rmind sync with head
 1.105.2.4 02-Jul-2010  rmind Undo 1.105.2.2 revision, note that uvm_km_pgremove_intrsafe() extracts the
mapping, improve comments.
 1.105.2.3 30-May-2010  rmind sync with head
 1.105.2.2 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.105.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.107.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.107.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.108.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.111.6.6 02-Jun-2012  mrg sync to latest -current.
 1.111.6.5 29-Apr-2012  mrg sync to latest -current.
 1.111.6.4 05-Apr-2012  mrg sync to latest -current.
 1.111.6.3 04-Mar-2012  mrg sync to latest -current.
 1.111.6.2 24-Feb-2012  mrg sync to -current.
 1.111.6.1 18-Feb-2012  mrg merge to -current.
 1.111.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.111.2.4 30-Oct-2012  yamt sync with head
 1.111.2.3 18-Apr-2012  yamt byebye VM_MAP_INTRSAFE
 1.111.2.2 17-Apr-2012  yamt sync with head
 1.111.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.120.2.4 25-Nov-2013  bouyer Pull up following revision(s) (requested by para in ticket #989):
sys/uvm/uvm_km.c: revision 1.125
uvm_km_kmem_alloc: don't hardcode kmem_va_arena
 1.120.2.3 07-Sep-2012  riz branches: 1.120.2.3.2; 1.120.2.3.4;
Pull up following revision(s) (requested by para in ticket #547):
sys/uvm/uvm_map.c: revision 1.320
sys/uvm/uvm_map.c: revision 1.321
sys/uvm/uvm_map.c: revision 1.322
sys/uvm/uvm_km.c: revision 1.130
sys/uvm/uvm_km.c: revision 1.131
sys/uvm/uvm_km.c: revision 1.132
sys/uvm/uvm_km.c: revision 1.133
sys/uvm/uvm_km.c: revision 1.134
sys/uvm/uvm_km.c: revision 1.135
sys/uvm/uvm_km.c: revision 1.129
Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.
Don't try grow the entire kmem space but just do as needed in
uvm_km_kmem_alloc
Shut up gcc printf warning.
Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.
Switch to a spin lock (uvm_kentry_lock) which, fortunately, was
sitting there
unused.
Remove locking since it isn't needed. As soon as the 2nd
uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the
pmap_growkernel call in
uvm_km_mem_alloc will never be called again.
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc
for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier
while here guard uvm_km_va_starved_p from kmem_arena not yet created
thanks for tracking this down to everyone involved
 1.120.2.2 17-Mar-2012  bouyer branches: 1.120.2.2.2;
Pull up following revision(s) (requested by rmind in ticket #113):
sys/uvm/uvm_km.c: revision 1.123
uvm_km_kmem_alloc: return ENOMEM on failure in PMAP_MAP_POOLPAGE case.
 1.120.2.1 22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #29):
sys/arch/xen/x86/x86_xpmap.c: revision 1.39
sys/arch/xen/include/hypervisor.h: revision 1.37
sys/arch/xen/include/intr.h: revision 1.34
sys/arch/xen/x86/xen_ipi.c: revision 1.10
sys/arch/x86/x86/cpu.c: revision 1.97
sys/arch/x86/include/cpu.h: revision 1.48
sys/uvm/uvm_map.c: revision 1.315
sys/arch/x86/x86/pmap.c: revision 1.165
sys/arch/xen/x86/cpu.c: revision 1.81
sys/arch/x86/x86/pmap.c: revision 1.167
sys/arch/xen/x86/cpu.c: revision 1.82
sys/arch/x86/x86/pmap.c: revision 1.168
sys/arch/xen/x86/xen_pmap.c: revision 1.17
sys/uvm/uvm_km.c: revision 1.122
sys/uvm/uvm_kmguard.c: revision 1.10
sys/arch/x86/include/pmap.h: revision 1.50
Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.
2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.
To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.
Makes LOCKDEBUG kernels boot again
Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
 1.120.2.3.4.1 25-Nov-2013  bouyer Pull up following revision(s) (requested by para in ticket #989):
sys/uvm/uvm_km.c: revision 1.125
uvm_km_kmem_alloc: don't hardcode kmem_va_arena
 1.120.2.3.2.1 25-Nov-2013  bouyer Pull up following revision(s) (requested by para in ticket #989):
sys/uvm/uvm_km.c: revision 1.125
uvm_km_kmem_alloc: don't hardcode kmem_va_arena
 1.120.2.2.2.1 01-Nov-2012  matt sync with netbsd-6-0-RELEASE.
 1.135.2.2 03-Dec-2017  jdolecek update from HEAD
 1.135.2.1 25-Feb-2013  tls resync with head
 1.138.14.3 28-Aug-2017  skrll Sync with HEAD
 1.138.14.2 05-Oct-2016  skrll Sync with HEAD
 1.138.14.1 06-Apr-2015  skrll Sync with HEAD
 1.138.12.1 25-Mar-2015  snj Pull up following revision(s) (requested by maxv in ticket #617):
sys/kern/kern_malloc.c: revision 1.144, 1.145
sys/kern/kern_pmf.c: revision 1.37
sys/rump/librump/rumpkern/rump.c: revision 1.316
sys/uvm/uvm_extern.h: revision 1.193
sys/uvm/uvm_km.c: revision 1.139
Don't include <uvm/uvm_extern.h>
--
Kill kmeminit().
--
Remove this MALLOC_DEFINE (M_PMF unused).
 1.139.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.139.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.139.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.141.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.143.2.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.144.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.144.4.1 10-Jun-2019  christos Sync with HEAD
 1.144.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.144.2.1 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.152.2.2 29-Feb-2020  ad Sync with head.
 1.152.2.1 25-Jan-2020  ad Sync with head.
 1.159.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.162.4.1 15-Dec-2024  martin Pull up following revision(s) (requested by chs in ticket #1027):

sys/uvm/uvm_km.c: revision 1.166

kmem: improve behavior when using all of physical memory as kmem

On systems where kmem does not need to be limited by kernel virtual
space (essentially 64-bit platforms), we currently try to size the
"kmem" space to be big enough for all of physical memory to be
allocated as kmem, which really means that we will always run short of
physical memory before we run out of kernel virtual space. However
this does not take into account that uvm_km_va_starved_p() starts
reporting that we are low on kmem virtual space when we have used 90%
of it, in an attempt to avoid kmem space becoming too fragmented,
which means on large memory systems we will still start reacting to
being short of virtual space when there is plenty of physical memory
still available. Fix this by overallocating the kmem space by a
factor of 10/9 so that we always run low on physical memory first,
as we want.
 1.165.6.1 02-Aug-2025  perseant Sync with HEAD
 1.20 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.19 02-Feb-2011  chuck branches: 1.19.4; 1.19.8;
udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.18 01-Dec-2008  ad branches: 1.18.6; 1.18.8; 1.18.10; 1.18.12;
PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.
 1.17 21-Feb-2007  thorpej branches: 1.17.38; 1.17.42; 1.17.48; 1.17.52; 1.17.54;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.16 25-May-2006  yamt branches: 1.16.12;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html
 1.15 11-Dec-2005  christos branches: 1.15.4; 1.15.6; 1.15.8; 1.15.14;
merge ktrace-lwp.
 1.14 01-Apr-2005  yamt branches: 1.14.2;
merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.13 24-Mar-2004  junyoung branches: 1.13.8; 1.13.10;
Nuke __P().
 1.12 10-May-2003  thorpej branches: 1.12.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.
 1.11 08-May-2003  thorpej Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).
 1.10 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.9 21-Jun-1999  thorpej branches: 1.9.14; 1.9.16; 1.9.18;
Protect prototypes, certain macros, and inlines from userland.
 1.8 25-May-1999  thorpej Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.
 1.7 25-Mar-1999  mrg branches: 1.7.4;
remove now >1 year old pre-release message.
 1.6 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.5 10-Feb-1998  mrg branches: 1.5.2;
- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.5.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.7.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.7.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.9.18.1 01-Oct-2001  fvdl Catch up with -current.
 1.9.16.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.9.14.1 21-Sep-2001  nathanw Catch up to -current.
 1.12.2.4 01-Apr-2005  skrll Sync with HEAD.
 1.12.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.12.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.12.2.1 03-Aug-2004  skrll Sync with HEAD
 1.13.10.1 25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.13.8.1 29-Apr-2005  kent sync with -current
 1.14.2.2 26-Feb-2007  yamt sync with head.
 1.14.2.1 21-Jun-2006  yamt sync with head.
 1.15.14.1 19-Jun-2006  chap Sync with head.
 1.15.8.1 26-Jun-2006  yamt sync with head.
 1.15.6.1 01-Jun-2006  kardel Sync with head.
 1.15.4.1 09-Sep-2006  rpaulo sync with head
 1.16.12.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.17.54.1 19-Apr-2009  snj Pull up following revision(s) (requested by mrg in ticket #708):
sys/uvm/uvm_km.c: revision 1.102
sys/uvm/uvm_km.h: revision 1.18
sys/uvm/uvm_map.c: revision 1.264
PR port-amd64/32816 amd64 can not load lkms
Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.
 1.17.52.1 19-Jan-2009  skrll Sync with HEAD.
 1.17.48.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.17.42.1 04-May-2009  yamt sync with head.
 1.17.38.1 17-Jan-2009  mjf Sync with HEAD.
 1.18.12.1 08-Feb-2011  bouyer Sync with HEAD
 1.18.10.1 06-Jun-2011  jruoho Sync with HEAD.
 1.18.8.1 05-Mar-2011  rmind sync with head
 1.18.6.2 06-Jul-2010  uebayasi Directly allocate zero'ed vm_page for XIP unallocated blocks, instead
of abusing pool page. Move the code to XIP vnode pager in genfs_io.c.
 1.18.6.1 10-Feb-2010  uebayasi Initial attempt to implement uvm_pageofzero_xip(), which returns a pointer
to a single read-only zeroed page. This is meant to be used for XIP now.
Only compile tested.
 1.19.8.1 18-Feb-2012  mrg merge to -current.
 1.19.4.1 17-Apr-2012  yamt sync with head
 1.12 27-Jul-2015  maxv Several changes and improvements in KMEM_GUARD:
- merge uvm_kmguard.{c,h} into subr_kmem.c. It is only user there, and
makes it more consistent. Also, it allows us to enable KMEM_GUARD
without enabling DEBUG.
- rename uvm_kmguard_XXX to kmem_guard_XXX, for consistency
- improve kmem_guard_alloc() so that it supports allocations bigger than
PAGE_SIZE
- remove the canary value, and use directly the kmem header as underflow
pattern.
- fix some comments

(The UAF fifo is disabled for the moment; we actually need to register
the va and its size, and add a weight support not to consume too much
memory.)
 1.11 25-Feb-2014  martin branches: 1.11.6;
Mark a variable used only in diagnostics
 1.10 20-Feb-2012  bouyer branches: 1.10.2; 1.10.4;
When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
 1.9 05-Feb-2012  rmind branches: 1.9.2;
uvm_kmguard_alloc: use vmem_addr_t, instead of vaddr_t.
Fixes the build on ports where vaddr_t is of different size.
 1.8 05-Feb-2012  rmind - Make KMGUARD interrupt-safe.
- kmem_intr_{alloc,free}: remove workaround.

Changes affect KMGUARD-enabled debug kernels only.
 1.7 28-Jan-2012  rmind pool_page_alloc, pool_page_alloc_meta: avoid extra compare, use const.
ffs_mountfs,sys_swapctl: replace memset with kmem_zalloc.
sys_swapctl: move kmem_free outside the lock path.
uvm_init: fix comment, remove pointless numeration of steps.
uvm_map_enter: remove meflagval variable.
Fix some indentation.
 1.6 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.5 23-Apr-2011  rmind branches: 1.5.4; 1.5.8;
Replace "malloc" in comments, remove unnecessary header inclusions.
 1.4 02-Nov-2010  skrll branches: 1.4.2;
Spell immediately correctly.
 1.3 14-May-2010  cegger Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@
 1.2 07-Nov-2009  cegger branches: 1.2.2; 1.2.4;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.1 29-Mar-2009  ad branches: 1.1.2; 1.1.4; 1.1.6;
kernel memory guard for DEBUG kernels, proposed on tech-kern.
See kmem_alloc(9) for details.
 1.1.6.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.1.6.1 29-Mar-2009  jym file uvm_kmguard.c was added on branch jym-xensuspend on 2009-05-13 17:23:10 +0000
 1.1.4.4 11-Aug-2010  yamt sync with head.
 1.1.4.3 11-Mar-2010  yamt sync with head
 1.1.4.2 04-May-2009  yamt sync with head.
 1.1.4.1 29-Mar-2009  yamt file uvm_kmguard.c was added on branch yamt-nfs-mp on 2009-05-04 08:14:39 +0000
 1.1.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.1.2.1 29-Mar-2009  skrll file uvm_kmguard.c was added on branch nick-hppapmap on 2009-04-28 07:37:58 +0000
 1.2.4.3 31-May-2011  rmind sync with head
 1.2.4.2 05-Mar-2011  rmind sync with head
 1.2.4.1 30-May-2010  rmind sync with head
 1.2.2.2 06-Nov-2010  uebayasi Sync with HEAD.
 1.2.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.4.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.5.8.2 24-Feb-2012  mrg sync to -current.
 1.5.8.1 18-Feb-2012  mrg merge to -current.
 1.5.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.5.4.1 17-Apr-2012  yamt sync with head
 1.9.2.1 22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #29):
sys/arch/xen/x86/x86_xpmap.c: revision 1.39
sys/arch/xen/include/hypervisor.h: revision 1.37
sys/arch/xen/include/intr.h: revision 1.34
sys/arch/xen/x86/xen_ipi.c: revision 1.10
sys/arch/x86/x86/cpu.c: revision 1.97
sys/arch/x86/include/cpu.h: revision 1.48
sys/uvm/uvm_map.c: revision 1.315
sys/arch/x86/x86/pmap.c: revision 1.165
sys/arch/xen/x86/cpu.c: revision 1.81
sys/arch/x86/x86/pmap.c: revision 1.167
sys/arch/xen/x86/cpu.c: revision 1.82
sys/arch/x86/x86/pmap.c: revision 1.168
sys/arch/xen/x86/xen_pmap.c: revision 1.17
sys/uvm/uvm_km.c: revision 1.122
sys/uvm/uvm_kmguard.c: revision 1.10
sys/arch/x86/include/pmap.h: revision 1.50
Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.
2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.
To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.
Makes LOCKDEBUG kernels boot again
Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
 1.10.4.1 18-May-2014  rmind sync with head
 1.10.2.2 03-Dec-2017  jdolecek update from HEAD
 1.10.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.11.6.1 22-Sep-2015  skrll Sync with HEAD
 1.3 27-Jul-2015  maxv Several changes and improvements in KMEM_GUARD:
- merge uvm_kmguard.{c,h} into subr_kmem.c. It is only user there, and
makes it more consistent. Also, it allows us to enable KMEM_GUARD
without enabling DEBUG.
- rename uvm_kmguard_XXX to kmem_guard_XXX, for consistency
- improve kmem_guard_alloc() so that it supports allocations bigger than
PAGE_SIZE
- remove the canary value, and use directly the kmem header as underflow
pattern.
- fix some comments

(The UAF fifo is disabled for the moment; we actually need to register
the va and its size, and add a weight support not to consume too much
memory.)
 1.2 05-Feb-2012  rmind branches: 1.2.6; 1.2.24;
- Make KMGUARD interrupt-safe.
- kmem_intr_{alloc,free}: remove workaround.

Changes affect KMGUARD-enabled debug kernels only.
 1.1 29-Mar-2009  ad branches: 1.1.2; 1.1.4; 1.1.6; 1.1.18; 1.1.22;
kernel memory guard for DEBUG kernels, proposed on tech-kern.
See kmem_alloc(9) for details.
 1.1.22.1 18-Feb-2012  mrg merge to -current.
 1.1.18.1 17-Apr-2012  yamt sync with head
 1.1.6.2 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.1.6.1 29-Mar-2009  jym file uvm_kmguard.h was added on branch jym-xensuspend on 2009-05-13 17:23:10 +0000
 1.1.4.2 04-May-2009  yamt sync with head.
 1.1.4.1 29-Mar-2009  yamt file uvm_kmguard.h was added on branch yamt-nfs-mp on 2009-05-04 08:14:39 +0000
 1.1.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.1.2.1 29-Mar-2009  skrll file uvm_kmguard.h was added on branch nick-hppapmap on 2009-04-28 07:37:58 +0000
 1.2.24.1 22-Sep-2015  skrll Sync with HEAD
 1.2.6.1 03-Dec-2017  jdolecek update from HEAD
 1.104 11-Jun-2020  ad Counter tweaks:

- Don't need to count anonpages+filepages any more; clean+unknown+dirty for
each kind of page can be summed to get the totals.

- Track the number of free pages with a counter so that it's one less thing
for the allocator to do, which opens up further options there.

- Remove cpu_count_sync_one(). It has no users and doesn't save a whole lot.
For the cheap option, give cpu_count_sync() a boolean parameter indicating
that a cached value is okay, and rate limit the updates for cached values
to hz.
 1.103 20-May-2020  ad uvm_loanuobjpages():

- there are no pages to unbusy in the error case
- always clear the caller's page array
 1.102 19-May-2020  ad uvm_loanuobjpages():

- vmobjlock is shared between tmpfs vnodes and UAOs now
- split into two routines, to simplify
- fix error recovery
 1.101 17-May-2020  ad Start trying to reduce cache misses on vm_page during fault processing.

- Make PGO_LOCKED getpages imply PGO_NOBUSY and remove the latter. Mark
pages busy only when there's actually I/O to do.

- When doing COW on a uvm_object, don't mess with neighbouring pages. In
all likelyhood they're already entered.

- Don't mess with neighbouring VAs that have existing mappings as replacing
those mappings with same can be quite costly.

- Don't enqueue pages for neighbour faults unless not enqueued already, and
don't activate centre pages unless uvmpdpol says its useful.

Also:

- Make PGO_LOCKED getpages on UAOs work more like vnodes: do gang lookup in
the radix tree, and don't allocate new pages.

- Fix many assertion failures around faults/loans with tmpfs.
 1.100 22-Mar-2020  ad Process concurrent page faults on individual uvm_objects / vm_amaps in
parallel, where the relevant pages are already in-core. Proposed on
tech-kern.

Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until
adjustments are made to their pmaps.
 1.99 20-Mar-2020  ad Go back to freeing struct vm_anon one by one. There may have been an
advantage circa ~2008 but there isn't now.
 1.98 17-Mar-2020  ad Tweak the March 14th change to make page waits interlocked by pg->interlock.
Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude
possible bugs.
 1.97 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.96 24-Feb-2020  ad uvm_unloanpage(): fix a screwup in previous. slock must be set NULL if
it can't be acquired.
 1.95 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.94 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.93 31-Dec-2019  ad branches: 1.93.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.92 18-Dec-2019  ad PR kern/54783: t_mmap crahes the kernel

- Fix various locking & sequencing errors with breaking loans.

- Don't call uvm_pageremove_tree() while holding pg->interlock as radixtree
can take further locks when freeing nodes.
 1.91 15-Dec-2019  ad Merge from yamt-pagecache:

- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
 1.90 14-Dec-2019  ad Don't call uvm_pagedequeue() while holding pg->interlock.
 1.89 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.88 01-Dec-2019  ad - Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.
 1.87 25-May-2018  jdolecek branches: 1.87.2;
add the KASSERT() for loan_count wrap-around to all places which increase it
 1.86 19-May-2018  jdolecek detect wraparound when bumping page wire_count and loan_count
 1.85 28-Oct-2017  pgoyette branches: 1.85.2;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.84 19-Mar-2017  riastradh branches: 1.84.6;
__diagused police
 1.83 30-Jul-2012  matt branches: 1.83.2; 1.83.16; 1.83.20; 1.83.24;
-fno-common broke kernhist since it used commons.
Add a KERNHIST_DEFINE which is define the kernel history.
Change UVM to deal with the new usage.
 1.82 19-Feb-2012  rmind Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".
 1.81 06-Aug-2011  rmind branches: 1.81.2; 1.81.6;
- Rework uvm_anfree() into uvm_anon_freelst(), which always drops the lock.
- Free anons in uvm_anon_freelst() without lock held.
- Mechanic sync to unused loaning code.
 1.80 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.79 23-Apr-2011  rmind branches: 1.79.2;
Replace "malloc" in comments, remove unnecessary header inclusions.
 1.78 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.77 03-Feb-2010  uebayasi branches: 1.77.2; 1.77.4; 1.77.6; 1.77.8;
A few assertions & comments.
 1.76 02-Feb-2010  uebayasi Don't pass an unnecessary reference to uvm_loanbreak_anon().

Requested by rmind@.
 1.75 02-Feb-2010  uebayasi Move A->K loan break code to uvm_loan.c.
 1.74 05-Dec-2009  pooka Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.
 1.73 03-Dec-2008  pooka uvm_loanuobjpages(): "nfsread" -> "loanuopg" in tsleep wmesg
 1.72 17-Jun-2008  yamt branches: 1.72.2; 1.72.4; 1.72.10;
initialize uvm_loanzero_object correctly after page-cache rbtree changes.
 1.71 04-Jun-2008  ad branches: 1.71.2;
listq -> listq.queue
 1.70 02-Jan-2008  ad branches: 1.70.6; 1.70.8; 1.70.10; 1.70.12;
Merge vmlocking2 to head.
 1.69 01-Dec-2007  yamt branches: 1.69.2; 1.69.6;
constify pagerops.
 1.68 01-Dec-2007  yamt use designated initiaizers for uvm_pagerops.
 1.67 11-Oct-2007  ad branches: 1.67.4;
Remove LOCK_ASSERT(!simple_lock_held(&foo));
 1.66 21-Jul-2007  ad branches: 1.66.4; 1.66.6; 1.66.8; 1.66.10;
Merge unobtrusive locking changes from the vmlocking branch.
 1.65 22-Feb-2007  thorpej branches: 1.65.4; 1.65.12;
TRUE -> true, FALSE -> false
 1.64 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.63 15-Dec-2006  yamt branches: 1.63.2;
put ->K loaned pages on the page queue, so that page loaning doesn't
disturb pagedaemon/pdpolicy.
 1.62 01-Nov-2006  yamt remove some __unused from function parameters.
 1.61 14-Oct-2006  yamt uvm_loanbreak: transfer dirtiness of the old page to the new page,
rather than blindly mark it dirty. fix a part of PR/33513.
 1.60 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.59 18-Apr-2006  yamt branches: 1.59.8; 1.59.10;
from Christian Ehrhardt:
* uvm_loanzero may call uvm_analloc which will return with anon->an_lock
locked. This lock is never dropped by uvm_loanzero and AFAICS the caller
doesn't drop it either.
 1.58 31-Jan-2006  yamt branches: 1.58.2; 1.58.4; 1.58.6; 1.58.8; 1.58.10;
handle "strange" filesystems like layered filesystems and tmpfs,
where pgo_get returns pages which don't belong to the uobj.
also fix an XXX in uvm_loananon and lock-unlock mismatch in uvm_loanuobj.

PR/28372, PR/32665 (Alan Barrett).
 1.57 24-Dec-2005  perry branches: 1.57.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.56 11-Dec-2005  christos merge ktrace-lwp.
 1.55 28-Jun-2005  thorpej branches: 1.55.2;
Make a note about why a large function like uvm_loanentry() can be
an inline in this case.
 1.54 27-Jun-2005  thorpej Use ANSI function decls.
 1.53 11-May-2005  yamt allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.52 23-Nov-2004  yamt introduce UVMHIST_LOANHIST and sprinkle UVMHIST_LOGs.
 1.51 21-Nov-2004  yamt - prevent wired pages from being loaned, rather than just panicking.
caller should take care of failure by eg. falling back to dumb copy.
PR/23285.
- add some related assertions.
 1.50 24-Mar-2004  junyoung - Nuke __P().
- Drop trailing spaces.
 1.49 13-Feb-2004  drochner make this compile whether DIAGNOSTIC is defined or not
 1.48 13-Feb-2004  yamt when breaking a loan from uobj,
insert the replacement page into the same position
as the original page on the object memq so that
genfs_putpages (and lfs) won't be confused.

noted by Stephan Uphoff (PR/24328)
 1.47 13-Feb-2004  yamt uvm_loanentry: add a missing uvmfault_unlockall.
 1.46 30-Jan-2004  yamt uvm_loanuobjpages: fix a comment.
 1.45 07-Jan-2004  yamt - get pages to loan out in uvm_loanuobjpages() rather than
having caller (nfsd, in this case) do so.
- tweak locking so that nfs loaned READ works on layered filesystems.
 1.44 27-Oct-2003  yamt uvm_loanzero:
- after sleeping for memory, re-check if we have a page.
- put the allocated page to pageq to appease UVM_PAGE_TRKOWN.
- dequeue the page when doing ->K loan.
 1.43 26-Oct-2003  yamt whitespace.
 1.42 03-May-2003  yamt branches: 1.42.2;
- export raw page loan out routine as uvm_loanuobjpages. (for nfsd)
- put code for loan-breaking into a function, uvm_loanbreak.
 1.41 05-Mar-2003  thorpej Implement a minimal pager for the uvm_loanzero_object, which simply has
a "put" method which reactivates or dequeues the page.

Need for pager pointed out by enami tsugutomo.
 1.40 04-Mar-2003  thorpej Fix the following pathological scanario:
* User allocates ZFOD region, but does not actually touch the buffer
to fault in the pages.
* In a loop, user writes this buffer to a network socket, triggering
sosend_loan().
* uvm_loan() calls uvm_loanzero() once for each page in the loaned
region (since the pages have not yet faulted in). This causes a
page to be allocated and zero'd. The result is the kernel spends
a lot of time allocating and zero'ing pages.

This fixes creates a special object which owns a single zero'd page.
This single zero'd page is used to satisfy all loans of non-resident
ZFOD mappings.

Thanks to Allen Briggs for discovering the problem and for providing
an initial patch.
 1.39 14-Jul-2002  chs when dropping a kernel loan, if this was the last loan-to-kernel but
the page is still loaned to an anon, we should put the page back on a
paging queue. this is because while pages loaned to the kernel really
do need to stay resident (since the kernel is accessing the physical
memory directly), pages loaned to anons can be paged out just fine.
(the page will be paged out twice, first to the object and then again
to the anon, but after that the page can be reused.)
 1.38 29-May-2002  enami Add missing pageq lock while uvm_pagefree() is called (either directly
or indirectly). Reviewed by chuq.
 1.37 07-May-2002  enami branches: 1.37.2; 1.37.4;
Fetch the right page from a file even if it is mapped from middle of it.
This makes `tail -<N> <FILE> | cat > file' correctly, where <FILE> is
a regular file larger than 10Mbytes (makes tail to map part of file)
and <N> is big enough to produce output larger than 8kbytes (makes pipe
to use page loan facility). Problem reported by FUKAUMI Naoki on japanese
local mailing list.
 1.36 31-Dec-2001  chs fix locking for loaning. in general we should be looking at the page's
uobject and uanon pointers rather than at the PQ_ANON flag to determine
which lock to hold, since PQ_ANON can be clear even when the anon's lock
is the one which we should hold (if the page was loaned from an object
and then freed by the object).
 1.35 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.34 06-Nov-2001  chs several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.
 1.33 22-Sep-2001  jdolecek branches: 1.33.2;
add new UVM_LOAN_WIRED flag - the memory pages loaned in TOPAGE case
are only wired if this flag is present (i.e. they are not wired by default now)
loaned pages are unloaned via new uvm_unloan(), uvm_unloananon() and
uvm_unloanpage() are no longer exported
adjust uvm_unloanpage() to unwire the pages if UVM_LOAN_WIRED is specified
mark uvm_loanuobj() and uvm_loanzero() static also in function implementation

kern/sys_pipe.c: uvm_unloanpage() --> uvm_unloan()
 1.32 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.31 27-Aug-2001  chuck branches: 1.31.2;
handle a locking problem where the second (or later) call in the loanentry
loop returns 0. loanentry was returning >0, but was unlocking the maps
(because of the zero). reworked to avoid this. problem reported by
chuck silvers. also clarify a comment that jdolecek asked about.
 1.30 18-Aug-2001  chs when fetching an object page to loan out, do so synchronously.
 1.29 25-May-2001  chs branches: 1.29.2;
remove trailing whitespace.
 1.28 10-Apr-2001  chuck fix locking problem noted by Jaromir Dolecek. also, add more comments
on locking rules to make code easier to understand. locking in
uvm_loananon still needs some work on fringe cases where anon's page
is actually on loan from a uobj.
 1.27 09-Apr-2001  jdolecek Upon Chuck Cranor request, revert rev. 1.26. There is indeed a bug in way
locking is done, but this fix is not the right way to fix it.
 1.26 08-Apr-2001  jdolecek Remove superflous uvmfault_unlockmaps() in uvm_loan(), only call it
if uvm_loanentry() returned 0; otherwise, the unlocking would already
have been done by uvmfault_unlockall() call in uvm_loanentry().
Okay'ed by Chuck Silvers
 1.25 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.24 10-Mar-2001  chs eliminate the VM_PAGER_* error codes in favor of the traditional E* codes.
the mapping is:

VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART

for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
 1.23 23-Jan-2001  thorpej branches: 1.23.2;
Change uvm_analloc() to return a locked anon, update all callers,
and fix an anon locking protocol error in uvm_loanzero().
 1.22 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.21 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.20 10-Apr-2000  thorpej Use UVM_PGA_ZERO in a few (easy) places.
 1.19 12-Sep-1999  chs branches: 1.19.2;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.
 1.18 22-Jul-1999  thorpej Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.17 03-Jun-1999  thorpej Just say no to interrupt-safe maps.
 1.16 27-May-1999  thorpej Change the main comment block to indicate why PMAP_NEW (specifically,
pmap_kenter*()) is not required for {O,A}->K page loans.
 1.15 11-Apr-1999  chs add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.14 25-Mar-1999  mrg branches: 1.14.2;
remove now >1 year old pre-release message.
 1.13 24-Jan-1999  chuck cleanup/reorg:
- break anon related functions out of uvm_amap.c and put them in their own
file (uvm_anon.c). includes break up uvm_anon_init into an amap and an
an anon init function
- ensure that only functions within the amap module access amap structure
fields (add macros to amap api as needed)
 1.12 04-Nov-1998  chs branches: 1.12.2;
be consistent with locking of amaps and anons when freeing them.
 1.11 18-Oct-1998  chs shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.
 1.10 11-Oct-1998  chuck remove unused share map code from UVM:
- update uvm_faultinfo's rvaddr to orig_rvaddr to match changes from
uvm_fault.h
 1.9 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.8 05-May-1998  kleink branches: 1.8.2;
Remove inclusions of syscall (and syscall argument) related header files;
we don't need them here.
 1.7 22-Mar-1998  chuck remove tmpwire arg from uvm_pagewire() -- it isn't needed anymore.
noted by chuck s.
 1.6 09-Mar-1998  mrg KNF.
 1.5 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 07-Feb-1998  mrg restore rcsids
 1.3 07-Feb-1998  chs fix typoes in locking.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.8.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.12.2.2 25-Feb-1999  chs thread_wakeup() -> wakeup().
 1.12.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.14.2.1 16-Apr-1999  chs branches: 1.14.2.1.2;
pull up 1.14 -> 1.15:
add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.14.2.1.2.2 21-Jun-1999  thorpej Sync w/ -current.
 1.14.2.1.2.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.19.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.19.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.19.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.19.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.19.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.23.2.9 01-Aug-2002  nathanw Catch up to -current.
 1.23.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.23.2.7 08-Jan-2002  nathanw Catch up to -current.
 1.23.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.23.2.5 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.23.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.23.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.23.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.23.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.29.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.29.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.29.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.29.2.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.29.2.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.31.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.33.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.37.4.1 01-Jun-2002  tv Pull up revision 1.38 (requested by enami in ticket #114):
Add missing pageq lock while uvm_pagefree() is called (either directly
or indirectly). Reviewed by chuq.
 1.37.2.2 15-Jul-2002  gehenna catch up with -current.
 1.37.2.1 30-May-2002  gehenna Catch up with -current.
 1.42.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.42.2.4 29-Nov-2004  skrll Sync with HEAD.
 1.42.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.42.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.42.2.1 03-Aug-2004  skrll Sync with HEAD
 1.55.2.7 21-Jan-2008  yamt sync with head
 1.55.2.6 07-Dec-2007  yamt sync with head
 1.55.2.5 27-Oct-2007  yamt sync with head.
 1.55.2.4 03-Sep-2007  yamt sync with head.
 1.55.2.3 26-Feb-2007  yamt sync with head.
 1.55.2.2 30-Dec-2006  yamt sync with head.
 1.55.2.1 21-Jun-2006  yamt sync with head.
 1.57.2.1 01-Feb-2006  yamt sync with head.
 1.58.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.58.8.1 19-Apr-2006  elad oops - *really* sync to head this time.
 1.58.6.1 24-May-2006  yamt sync with head.
 1.58.4.1 22-Apr-2006  simonb Sync with head.
 1.58.2.1 09-Sep-2006  rpaulo sync with head
 1.59.10.3 18-Dec-2006  yamt sync with head.
 1.59.10.2 10-Dec-2006  yamt sync with head.
 1.59.10.1 22-Oct-2006  yamt sync with head
 1.59.8.2 12-Jan-2007  ad Sync with head.
 1.59.8.1 18-Nov-2006  ad Sync with head.
 1.63.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.65.12.1 15-Aug-2007  skrll Sync with HEAD.
 1.65.4.3 01-Nov-2007  ad Yielding to avoid livelock doesn't work well, so just sleep for 1 tick.
This too is inadequate and a better solution must be found. Discussed
with yamt@.
 1.65.4.2 03-Jul-2007  yamt if wrong-order trylocking failed, avoid livelock by yielding cpu
before retrying. ok'ed by Andrew Doran.
 1.65.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.66.10.2 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.66.10.1 21-Jul-2007  ad file uvm_loan.c was added on branch matt-mips64 on 2007-07-21 19:21:55 +0000
 1.66.8.1 14-Oct-2007  yamt sync with head.
 1.66.6.2 09-Jan-2008  matt sync with HEAD
 1.66.6.1 06-Nov-2007  matt sync with HEAD
 1.66.4.2 03-Dec-2007  joerg Sync with HEAD.
 1.66.4.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.67.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.67.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.69.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.69.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.70.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.70.10.2 11-Mar-2010  yamt sync with head
 1.70.10.1 04-May-2009  yamt sync with head.
 1.70.8.1 17-Jun-2008  yamt sync with head.
 1.70.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.70.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.70.6.1 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.71.2.1 18-Jun-2008  simonb Sync with head.
 1.72.10.1 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.72.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.72.2.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.77.8.1 08-Feb-2011  bouyer Sync with HEAD
 1.77.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.77.4.5 31-May-2011  rmind sync with head
 1.77.4.4 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.77.4.3 05-Mar-2011  rmind sync with head
 1.77.4.2 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.77.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.77.2.3 12-Jul-2010  uebayasi Reduce more diff by backing out XIP page specific code. Allow XIP pages
to be loaned.
 1.77.2.2 31-May-2010  uebayasi Re-define the definition of "device page"; device pages are pages of
device memory. Pages which don't have vm_page (== can't be used for
generic use), but whose PV are tracked, are called "direct pages" from
now.
 1.77.2.1 12-Feb-2010  uebayasi Teach device page handling.
 1.79.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.81.6.1 24-Feb-2012  mrg sync to -current.
 1.81.2.18 30-Oct-2012  yamt sync with head
 1.81.2.17 12-Jun-2012  yamt as usefulness of O->A loaning is unclear, disable it by default for now.

requested by rmind.
http://mail-index.NetBSD.org/tech-kern/2012/05/08/msg013139.html
 1.81.2.16 17-Apr-2012  yamt sync with head
 1.81.2.15 05-Feb-2012  yamt FALSE -> false
 1.81.2.14 05-Feb-2012  yamt turn vm.loanread sysctl to a threshold.
 1.81.2.13 25-Jan-2012  yamt uvm_loanabj: take an access pattern hint.
 1.81.2.12 18-Jan-2012  yamt - bug fixes
- minor optimizations
- assertions
- comments
 1.81.2.11 11-Jan-2012  yamt turn an error return to an assertion
 1.81.2.10 11-Jan-2012  yamt create a sysctl knob to turn on/off loaned read.
 1.81.2.9 04-Jan-2012  yamt O->A loan related statistics fixes.
 1.81.2.8 28-Dec-2011  yamt uvm_loanobj_read: try to avoid creating VAC aliases if PMAP_PREFER is available
 1.81.2.7 28-Dec-2011  yamt O->A loan fix
 1.81.2.6 28-Dec-2011  yamt missing include sys/atomic.h
 1.81.2.5 26-Dec-2011  yamt - use O->A loan to serve read(2). based on a patch from Chuck Silvers
- associated O->A loan fixes.
 1.81.2.4 20-Nov-2011  yamt - fix page loaning XXX make O->A loaning further
- add some statistics
 1.81.2.3 18-Nov-2011  yamt - use mutex obj for pageable object
- add a function to wait for a mutex obj being available
- replace some "livelock" kpauses with it
 1.81.2.2 06-Nov-2011  yamt remove pg->listq and uobj->memq
 1.81.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.83.24.1 21-Apr-2017  bouyer Sync with HEAD
 1.83.20.1 20-Mar-2017  pgoyette Sync with HEAD
 1.83.16.1 28-Aug-2017  skrll Sync with HEAD
 1.83.2.1 03-Dec-2017  jdolecek update from HEAD
 1.84.6.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.85.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.85.2.1 21-May-2018  pgoyette Sync with HEAD
 1.87.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.93.2.2 29-Feb-2020  ad Sync with head.
 1.93.2.1 17-Jan-2020  ad Sync with head.
 1.17 02-Feb-2011  chuck branches: 1.17.4;
udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.16 02-Feb-2010  uebayasi branches: 1.16.4; 1.16.6; 1.16.8;
Don't pass an unnecessary reference to uvm_loanbreak_anon().

Requested by rmind@.
 1.15 02-Feb-2010  uebayasi Move A->K loan break code to uvm_loan.c.
 1.14 11-Dec-2005  christos branches: 1.14.74;
merge ktrace-lwp.
 1.13 24-Mar-2004  junyoung Nuke __P().
 1.12 07-Jan-2004  yamt - get pages to loan out in uvm_loanuobjpages() rather than
having caller (nfsd, in this case) do so.
- tweak locking so that nfs loaned READ works on layered filesystems.
 1.11 03-May-2003  yamt branches: 1.11.2;
- export raw page loan out routine as uvm_loanuobjpages. (for nfsd)
- put code for loan-breaking into a function, uvm_loanbreak.
 1.10 04-Mar-2003  thorpej Fix the following pathological scanario:
* User allocates ZFOD region, but does not actually touch the buffer
to fault in the pages.
* In a loop, user writes this buffer to a network socket, triggering
sosend_loan().
* uvm_loan() calls uvm_loanzero() once for each page in the loaned
region (since the pages have not yet faulted in). This causes a
page to be allocated and zero'd. The result is the kernel spends
a lot of time allocating and zero'ing pages.

This fixes creates a special object which owns a single zero'd page.
This single zero'd page is used to satisfy all loans of non-resident
ZFOD mappings.

Thanks to Allen Briggs for discovering the problem and for providing
an initial patch.
 1.9 06-Nov-2001  chs several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.
 1.8 22-Sep-2001  jdolecek branches: 1.8.2;
add new UVM_LOAN_WIRED flag - the memory pages loaned in TOPAGE case
are only wired if this flag is present (i.e. they are not wired by default now)
loaned pages are unloaned via new uvm_unloan(), uvm_unloananon() and
uvm_unloanpage() are no longer exported
adjust uvm_unloanpage() to unwire the pages if UVM_LOAN_WIRED is specified
mark uvm_loanuobj() and uvm_loanzero() static also in function implementation

kern/sys_pipe.c: uvm_unloanpage() --> uvm_unloan()
 1.7 21-Jun-1999  thorpej branches: 1.7.14; 1.7.16; 1.7.18;
Protect prototypes, certain macros, and inlines from userland.
 1.6 25-Mar-1999  mrg branches: 1.6.4;
remove now >1 year old pre-release message.
 1.5 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.4 10-Feb-1998  perry branches: 1.4.2;
add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.4.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.6.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.7.18.1 01-Oct-2001  fvdl Catch up with -current.
 1.7.16.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.7.14.2 14-Nov-2001  nathanw Catch up to -current.
 1.7.14.1 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.8.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.11.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.11.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.11.2.1 03-Aug-2004  skrll Sync with HEAD
 1.14.74.1 11-Mar-2010  yamt sync with head
 1.16.8.1 08-Feb-2011  bouyer Sync with HEAD
 1.16.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.16.4.1 05-Mar-2011  rmind sync with head
 1.17.4.2 25-Jan-2012  yamt uvm_loanabj: take an access pattern hint.
 1.17.4.1 26-Dec-2011  yamt - use O->A loan to serve read(2). based on a patch from Chuck Silvers
- associated O->A loan fixes.
 1.427 27-Apr-2025  riastradh posix_spawn(2): Allocate a new vmspace at process creation time.

This allocates a new vmspace for the process at the time the new
process is created, rather than sharing some other vmspace temporarily.
This eliminates any risk of anything bad happening due to temporary
sharing, since there isn't any sharing.

Resolves a race to where:

1. we set up the child to share proc0.p_vmspace at first,

2. another process tries to read the new child's psstrings via
kern.proc_args.<childpid>.argv or similar with the child's
p_reflock held and gets stuck in a uvm fault loop because
proc0.p_vmspace doesn't have the child's psstrings address
(inherited from the parent) mapped,

3. the child is waiting for p_reflock before it can replace its
p_vmspace or psstrings.

By allocating the vmspace up front, with no mappings in it, we avoid
exposing the child in this scenario. Minor possible downside is that
sysctl kern.proc_args.<childpid>.argv might spuriously fail with
EFAULT during this time (rather than fail with EBUSY as it does if
p_reflock is held concurrently) but that's not a particularly big
deal.

Patch and first paragraph of commit message written by chs@; minor
tweaks to comments -- and any mistakes in the analysis -- by me.

PR kern/59037: deadlock in posix_spawn
PR kern/59175: posix_spawn hang, hanging other process too
 1.426 16-Aug-2024  riastradh uvm_map(9): Make KASSERTMSG unconditional for findspace invariants.

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.425 15-Aug-2024  riastradh uvm_map(9): Apply the same orig_hint clamp again to the same entry.

Previous change dealt with case like length=0x1000 and:

[0x7defb000,0x7defc000) entry above orig_hint (entry->next)
0x77895000 orig_hint
[0x77894000,0x77895000) entry below orig_hint (entry)

by changing

entry->next->start - length

to

MIN(orig_hint, entry->next->start - length)

in order to enforce monotonicity of search.

In this case, if the tree search for a gap has failed, we retry with
a list search with exactly the same orig_hint and entry -- nothing
has changed them (only hint and tmp). So apply the same clamping.

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.424 14-Aug-2024  rin uvm_map: Fix build failure with DIAGNOSTIC for rev 1.422

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.423 14-Aug-2024  riastradh uvm_map(9): Make sure search in the nearest gap is monotonic.

The algorithm, on a hint clamped to the VM bounds, works as follows
(assuming topdown VM):

1. Make sure the hint is aligned, by subtracting the remainderin
uvm_map_align_va.

2. If the hint is equal to the VM max, try the first free gap.

3. If the hint is not equal to the VM max, but is already in use, try
the next gap _below_ the entry covering hint.

4. If the hint is not equal to the VM max and is not already in use,
try gap between the entry below hint and the next entry after it,
above hint.

In the last case, `entry' is the one below hint, and `entry->next' is
the one above it. We would take

entry->next->start - length

as the next candidate hint.

However, this algorithm is supposed to be a monotonic search through
the address space, and we might wind up with something like:

[0x7defb000,0x7defc000) entry above hint (entry->next)
0x77895000 hint
[0x77894000,0x77895000) entry below hint (entry)

In this case, if length=0x1000, we would take

0x7defb000 - 0x1000 = 0x7defa000

as the next candidate hint, but this violates monotonicity of the
search.

Instead, take the _smallest_ of orig_hint or entry->next->start -
length, to avoid violating monotonicity, so hint <= orig_hint.

I didn't commit this change before because it didn't seem to fix all
the manifestations of the problem, but we have more diagnostics now
so maybe we will find there is a _different_ violation of the same
invariants once this is committed -- and I'm pretty sure this change
is necessary to guarantee monotonicity in some cases (but I'm still
not sure why we're only hitting the problem on sh3).

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.422 14-Aug-2024  riastradh uvm_map(9): Show the next entry when findspace invariants fail too.

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.421 14-Aug-2024  riastradh uvm_map(9): Take vm map lock around uvm_unmap_remove.

This was tripping one of the assertions I added. While it is safe
here not to hold the lock -- caller has exclusive access to the map
at this point -- it is better if we can annotate the functions in
question with executable notes about locking rules, and taking a
single uncontended lock in the vm map destruction path is probably a
tiny cost worth those executable notes.

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.420 14-Aug-2024  riastradh uvm_map(9): Show the entry in findspace invariants.

No functional change intended in the non-crash3 case.

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.419 14-Aug-2024  riastradh uvm_map(9): Avoid potential arithmetic overflow.

Should be harmless in this case because vaddr_t is unsigned, so
there's no undefined behaviour here, but let's make it unnecessary to
wonder whether overflow is a problem.

No functional change intended.

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.418 14-Aug-2024  riastradh uvm_map(9): Assert another loop invariant in entry lookup.

No functional change intended.

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.417 13-Aug-2024  riastradh Redo uvm_map.c 1.414 without the null pointer dereference.

uvm_map(9): Sprinkle assertions and interface contract comments.

No functional change intended.

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.416 13-Aug-2024  riastradh Revert uvm_map.c 1.414.

This was:

uvm_map(9): Sprinkle assertions and interface contract comments.

Apparently, you have to actually test changes, not just prove they
are correct. Who knew??

(And the incantation `No functional change intended.' didn't work
either!)

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.415 13-Aug-2024  riastradh uvm_map(9): Sprinkle invariant assertions into uvm_map_space_avail.

No functional change intended.

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.414 13-Aug-2024  riastradh uvm_map(9): Sprinkle assertions and interface contract comments.

No functional change intended.

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.413 16-Jul-2024  uwe uvm_findspace_invariants: don't repeat the message three times

The topdown and bottomup messages were exactly the same and sh3 printf
hack added the third copy. Restructure the code so that there's only
one message and make the message more obvious - the topdown condition
in the assertions was confusing b/c it's inverted (!topdown || ...
means it's the topdown map).

PR 51254
 1.412 15-Jul-2024  riastradh uvm_map.c: Fix kassertmsg/printf newline mismatch in PR 51254 note.
 1.411 09-Feb-2024  andvar branches: 1.411.2;
fix spelling mistakes, mainly in comments and log messages.
 1.410 23-Sep-2023  ad Repply this change with a couple of bugs fixed:

- Do away with separate pool_cache for some kernel objects that have no special
requirements and use the general purpose allocator instead. On one of my
test systems this makes for a small (~1%) but repeatable reduction in system
time during builds presumably because it decreases the kernel's cache /
memory bandwidth footprint a little.
- vfs_lockf: cache a pointer to the uidinfo and put mutex in the data segment.
 1.409 12-Sep-2023  ad Back out recent change to replace pool_cache with then general allocator.
Will return to this when I have time again.
 1.408 10-Sep-2023  ad - Do away with separate pool_cache for some kernel objects that have no special
requirements and use the general purpose allocator instead. On one of my
test systems this makes for a small (~1%) but repeatable reduction in system
time during builds presumably because it decreases the kernel's cache /
memory bandwidth footprint a little.
- vfs_lockf: cache a pointer to the uidinfo and put mutex in the data segment.
 1.407 03-Aug-2023  rin uvm_findspace(): For sh3, convert a KASSERTMSG(9) into printf(9)

XXX
Work around for PR kern/51254 until it gets fixed.

With this change, landisk survives full ATF with DIAGNOSTIC enabled.
 1.406 15-May-2023  chs uvm: avoid a deadlock in uvm_map_clean()

The locking order between map locks and page "busy" locks
is that the page "busy" lock comes first, but uvm_map_clean()
breaks this rule by holding a map locked (as reader) while
waiting for page "busy" locks.

If another thread is in the page-fault path holding a page
"busy" lock while waiting for the map lock (as a reader)
and at the same time a third thread is blocked waiting for
the map lock as a writer (which blocks the page-fault thread),
then these three threads will all deadlock with each other.

Fix this by marking the map "busy" (to block any modifications)
and unlocking the map lock before possibly waiting for any
page "busy" locks.

Martin Pieuchot reported that the same problem existed in OpenBSD
he applied this fix there after several people tested it.

fixes PR 56952
 1.405 24-Mar-2023  skrll Unwrap. NFCI.
 1.404 27-Feb-2023  riastradh uvm(9): KASSERT(A && B) -> KASSERT(A); KASSERT(B)

While here, print some of the inputs with KASSERTMSG.
 1.403 23-Nov-2022  riastradh branches: 1.403.2;
mmap(2): Avoid arithmetic overflow in search for free space.

PR kern/56900

Reported-by: syzbot+3833ae1d38037a263d05@syzkaller.appspotmail.com
https://syzkaller.appspot.com/bug?id=e542bcf59b2564cca1cb38c12f076fb08dcac37e
 1.402 08-Jun-2022  macallan initialize a variable to appease clang
 1.401 06-Jun-2022  rin PR kern/51254
uvm_map_findspace(): Output current value of "entry" when KASSERT fires.
 1.400 05-Jun-2022  riastradh uvm(9): Sprinkle assertions into uvm_map_findspace.

May help to diagnose PR kern/51254.
 1.399 05-Jun-2022  riastradh uvm(9): Don't duplicate vm_map_min/max in `show map' output.

Didn't notice these were already there, oops!
 1.398 04-Jun-2022  riastradh uvm(9): Sprinkle more info into hint/orig_hint assertions.

May help to diagnose PR kern/51254.
 1.397 04-Jun-2022  riastradh uvm(9): Print min/max address and first_free entry in ddb `show map'.

May help to diagnose PR kern/51254.
 1.396 04-Jun-2022  riastradh uvm(9): Fix mmap optimization for topdown case.

PR kern/51393
 1.395 04-Jun-2022  riastradh uvm(9): Fix 19-year-old bug in assertion about mmap hint.

Previously this would _first_ remember the original hint, and _then_
clamp the hint to the VM map's range:

orig_hint = hint;
if (hint < vm_map_min(map)) { /* check ranges ... */
if (flags & UVM_FLAG_FIXED) {
UVMHIST_LOG(maphist,"<- VA below map range",0,0,0,0);
return (NULL);
}
hint = vm_map_min(map);
...
KASSERTMSG(!topdown || hint <= orig_hint, "hint: %#jx, orig_hint: %#jx",
(uintmax_t)hint, (uintmax_t)orig_hint);

Even if nothing else happens in the ellipsis, taking the branch
guarantees the assertion will fail in the topdown case.
 1.394 10-Apr-2022  andvar fix various typos in comments and output/log messages.
 1.393 09-Apr-2022  riastradh sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.
 1.392 12-Mar-2022  riastradh sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.
 1.391 25-Nov-2021  skrll More / improved debug
 1.390 01-Jul-2021  chs in uvm_mapent_forkzero(), if the old entry was an object mapping,
appease a debug check by setting the new entry offset to zero along with
setting the new entry object pointer to NULL.

Reported-by: syzbot+de8e4b223a3838c7307b@syzkaller.appspotmail.com
Reported-by: syzbot+efaea991addfdcc5abd4@syzkaller.appspotmail.com
Reported-by: syzbot+15d1e19dff9209c2e40b@syzkaller.appspotmail.com
 1.389 20-Jun-2021  mrg remove diag-only printf() that fires when an unlinked file is mmapped
and someone runs ps(1) or similar.
 1.388 17-Apr-2021  mrg branches: 1.388.2;
fix error in previous: UVMHIST_PDHIST_SIZE needs to stay next to pdhistbuf[].
 1.387 17-Apr-2021  mrg remove KERNHIST_INIT_STATIC(). it stradles the line between usable
early in boot and broken early in boot by requiring a partly static
structure with another structure that must be present by the time
any uses are performed. theoretically platform code could allocate
a chunk while seting up memory and assign it here, giving a dynamic
sizing for the entry list, but the reality is that all users have
a statically allocated entry list as well.

the existing KERNHIST_LINK_STATIC() is used in conjunction with
KERNHIST_INITIALIZER() instead.

this stops a NULL pointer deref when the _LOG() macro is called
before the storage is linked in, which happens with GCC 10 on OCTEON
with UVMHIST enabled, crashing in very early kernel init.
 1.386 13-Mar-2021  skrll branches: 1.386.2;
Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats
 1.385 09-Jul-2020  skrll branches: 1.385.2;
Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
 1.384 30-May-2020  maxv Avoid passing file paths in panic strings, this results in extra long
output that is annoying and that syzbot classifies as independent reports
due to the instances having different build paths.
 1.383 09-May-2020  thorpej Make the uvm_voaddr structure more compact, only occupying 2 pointers
worth of space, by encoding the type in the lower bits of the object
pointer.
 1.382 30-Apr-2020  thorpej - In uvm_voaddr_acquire(), take an extra hold on the anon lock obj.
- In uvm_voaddr_release(), if the anon ref count drops to 0, call
uvm_anfree() rather than uvm_anon_release(). Unconditionally drop
the anon lock, and release the extra hold on the anon lock obj.

Fixes a panic that occurs if the backing store for a futex backed by
an anon memory location is unmapped while a thread is waiting in the
futex.

Add a test case that reproduced the panic to verify that it's fixed.
 1.381 19-Apr-2020  skrll Fix UVMHIST_LOG compile on 32bit platforms
 1.380 18-Apr-2020  riastradh Fix trailing whitespace.
 1.379 18-Apr-2020  thorpej Add an API to get a reference on the identity of an individual byte of
virtual memory, a "virtual object address". This is not a reference to
a physical byte of memory, per se, but a reference to a byte residing
in a page, owned by a unique UVM object (either a uobj or an anon). Two
separate address+addresses space tuples that reference the same byte in
an object (such as a location in a shared memory segment) will resolve
to equivalent virtual object addresses. Even if the residency status
of the page changes, the virtual object address remains unchanged.

struct uvm_voaddr -- a structure that encapsulates this address reference.

uvm_voaddr_acquire() -- a function to acquire this address reference,
given a vm_map and a vaddr_t.

uvm_voaddr_release() -- a function to release this address reference.

uvm_voaddr_compare() -- a function to compare two such address references.

uvm_voaddr_acquire() resolves the COW status of the object address before
acquiring.

In collaboration with riastradh@ and chs@.
 1.378 10-Apr-2020  ad uvmspace_exec(): set VM_MAP_DYING for the duration, so pmap_update() is not
called until the pmap has been totally cleared out after pmap_remove_all(),
or it can confuse some pmap implementations.
 1.377 04-Apr-2020  ad branches: 1.377.2;
Mark uvm_map_entry_cache with PR_LARGECACHE.
 1.376 22-Mar-2020  ad Process concurrent page faults on individual uvm_objects / vm_amaps in
parallel, where the relevant pages are already in-core. Proposed on
tech-kern.

Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until
adjustments are made to their pmaps.
 1.375 20-Mar-2020  ad Go back to freeing struct vm_anon one by one. There may have been an
advantage circa ~2008 but there isn't now.
 1.374 14-Mar-2020  ad uvm_map_lookup_entry(): save the hint even on failure, since code elsewhere
relies on it pointing to the previous entry.
 1.373 14-Mar-2020  ad - uvmspace_exec(), uvmspace_free(): if pmap_remove_all() returns true the
pmap is emptied. Pass UVM_FLAG_VAONLY when clearing out the map and avoid
needless extra work to tear down each mapping individually.

- uvm_map_lookup_entry(): remove the code to do a linear scan of map entries
for small maps, in preference to using the RB tree. It's questionable,
and I think the code is almost never triggered because the average number
of map entries has probably exceeded the hard-coded threshold for quite
some time.

- vm_map_entry: get it aligned on a cacheline boundary, and cluster fields
used during rbtree lookup at the beginning.
 1.372 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.371 12-Jan-2020  ad - uvm_unmap_remove(): need to call pmap_update() with the object still
locked, otherwise the page could gain a new identity and still be visible
via a stale mapping.

- Adjust reference counts with atomics.
 1.370 05-Jan-2020  para branches: 1.370.2;
remove unused predicate function

likely unused since kmem changes
 1.369 31-Dec-2019  ad - Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.368 27-Dec-2019  msaitoh s/referece/reference/ in comment.
 1.367 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.366 01-Nov-2019  rin Fix previous; semantics of align argument of uvm_map() is different
when UVM_FLAG_COLORMATCH is specified.

Should fix PR kern/54669.
 1.365 01-Nov-2019  rin PR kern/54395

- Align hint for virtual address at the beginning of uvm_map() if
required. Otherwise, it will be rounded up/down in an unexpected
way by uvm_map_space_avail(), which results in assertion failure.

Fix kernel panic when executing earm binary (8KB pages) on aarch64
(4KB pages), which relies on mmap(2) with MAP_ALIGNED flag.

- Use inline functions/macros consistently.

- Add some more KASSERT's.

For more details, see the PR as well as discussion on port-kern:
http://mail-index.netbsd.org/tech-kern/2019/10/27/msg025629.html
 1.364 10-Aug-2019  mrg KASSERT -> KASSERTMSG so we actually display the overflowed values.
 1.363 01-Aug-2019  riastradh Remove last trace of never-used map_attrib.
 1.362 12-Jul-2019  mlelstv branches: 1.362.2;
Add missing lock around pmap_protect.
ok, chs@

Reported-by: syzbot+6bfd0be70896fc9e9a3d@syzkaller.appspotmail.com
 1.361 11-Jul-2019  maxv Fix info leak: 'map_attrib' is not used in UVM, and contains uninitialized
heap garbage. Return zero. Maybe we should remove the field completely.
 1.360 08-Jun-2019  chs in uvm_map_protect(), do a pmap_update() before possibly switching from
removing pmap entries to creating them. this fixes the problem reported in
https://syzkaller.appspot.com/bug?id=cc89e47f05e4eea2fd69bcccb5e837f8d1ab4d60
 1.359 14-Mar-2019  kre Avoid a panic from the sequence

mlock(buf, 0);
munlock(buf, 0);
mlock(buf, page);
munlock(buf, page);

where buf is page aligned, and page is actually anything > 0
(but not too big) which will get rounded up to the next multiple
of the page size.

In that sequence, it is possible that the 1st munlock() is optional.

Add a KASSERT() (or two) to detect the first effects of the problem
(without that, or in !DIAGNOSTIC kernels) the problem eventually
causes some kind of problem or other (most often still a panic.)

After this, mlock(anything, 0) (or munlock) validates "anything"
but is otherwise a no-op (regardless of the alignment of anything).

Also, don't treat mlock(buf, verybig) as equivalent to mlock(buf, 0)
which is (more or less) what we had been doing.

XXX pullup -8 (maybe -7 as well, need to check).
 1.358 03-Mar-2019  maxv Fix bug, the entry we're iterating on is 'current', not 'entry'. Here only
the first entry gets wired in.
 1.357 17-Dec-2018  kamil Raise the fill_vmentries() E2BIG limit from 1MB to 10MB

The previous limit was not enough for libFuzzer as it requires up to 2.5MB
in test-suite.

Alternative approaches to retrieve larger address map during happened to be
worse during the evaluation due to difficulties in locking and atomicity.

Discussed with <christos>
 1.356 12-Sep-2018  maxv Remove this check, it has never protected against mmap on page zero, and
has since been replaced by the code in exec_vm_minaddr.
 1.355 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.354 06-Feb-2018  mrg branches: 1.354.2; 1.354.4;
uvm_map_extract() has an indentation issue.
 1.353 28-Oct-2017  pgoyette Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.352 01-Oct-2017  pgoyette Fix user-triggerable kernel crash as reported in PR kern/52573 (from
Bruno Haible).

XXX Pull-up to netbsd-8
 1.351 30-May-2017  chs branches: 1.351.2;
add assertions that would have caught the recent audio mmap bugs.
 1.350 23-May-2017  christos sprinkle __diagused to fix the powerpc build, which is not DIAGNOSTIC.
 1.349 20-May-2017  chs MAP_FIXED means something different for mremap() than it does for mmap(),
so we cannot use UVM_FLAG_FIXED to specify both behaviors.
keep UVM_FLAG_FIXED with its earlier meaning (prior to my previous change)
of whether to use uvm_map_findspace() to locate space for the new mapping or
to use the hint address that the caller passed in, and add a new flag
UVM_FLAG_UNMAP to indicate that any existing entries in the range should be
unmapped as part of creating the new mapping. the new UVM_FLAG_UNMAP flag
may only be used if UVM_FLAG_FIXED is also specified.
 1.348 19-May-2017  kamil Add missing , to fix syntax

Unbreaks build after recent change adding a message for vm.user_va0_disable
 1.347 19-May-2017  chs make MAP_FIXED mapping operations atomic. fixes PR 52239.
previously, unmapping any entries being replaced was done separately
from entering the new mapping, which allowed another thread doing
a non-MAP_FIXED mapping to allocate the range out from under the
MAP_FIXED thread.
 1.346 19-May-2017  christos mention the man page instead of the command.
 1.345 19-May-2017  christos Provide a helpful message to the user trying to run an birary that needs page
0 access.
 1.344 06-May-2017  joerg Extend the mmap(2) interface to allow requesting protections for later
use with mprotect(2), but without enabling them immediately.

Extend the mremap(2) interface to allow duplicating mappings, i.e.
create a second range of virtual addresses references the same physical
pages. Duplicated mappings can have different effective protections.

Adjust PAX mprotect logic to disallow effective protections of W&X, but
allow one mapping W and another X protections. This obsoletes using
temporary files for purposes like JIT.

Adjust PAX logic for mmap(2) and mprotect(2) to fail if W&X is requested
and not silently drop the X protection.

Improve test cases to ensure correct operation of the changed
interfaces.
 1.343 15-Mar-2017  christos branches: 1.343.4;
PR/52078: Don't panic on 0 allocation, check more bounds.
 1.342 01-Dec-2016  mrg branches: 1.342.2;
allow the sizes of the maphist and pdhist to be set in the config
file via UVMHIST_MAPHIST_SIZE and UVMHIST_PDHIST_SIZE.
 1.341 06-Aug-2016  maxv The way the kernel tries to prevent a userland process from allocating page
zero is hugely flawed. It is easy to demonstrate that one can trick UVM
into chosing a NULL hint after the user_va0_disable check from uvm_map.
Such a bypass allows kernel NULL pointer dereferences to be exploitable on
architectures with a shared userland<->kernel VA, like amd64.

Fix this by increasing the limit of the vm space made available for
userland processes. This way, UVM will never chose a NULL hint, since it
would be outside of the vm space.

The user_va0_disable sysctl still controls this feature.
 1.340 07-Jul-2016  msaitoh branches: 1.340.2;
KNF. Remove extra spaces. No functional change.
 1.339 18-Jun-2016  martin Change two KASSERT to KASSERTMSG to provide better diagnostics.
 1.338 01-Jun-2016  christos Avoid locking issues when copying out requires taking a fault and we are
finding out our own maps, by allocating a buffer and copying out after
we collected the information.
 1.337 25-May-2016  christos Introduce security.pax.mprotect.ptrace sysctl which can be used to bypass
mprotect settings so that debuggers can write to the text segment of traced
processes so that they can insert breakpoints. Turned off by default.
Ok: chuq (for now)
 1.336 05-Nov-2015  pgoyette Now that SYSVSHM is modularized, reattach the linkages from uvm so that
we can correctly clean up on process exit or fork.

Without this, firefox attaches to a shared memory segment but doesn't
detach before exit. Thus once firefox causes an autoload for sysv_ipc
it cannot be unloaded since the segment still retains references.
 1.335 24-Sep-2015  christos implement VM_PROC_MAP
 1.334 22-Jun-2015  matt Use %p, %#xl etc. for pointers and addresses.
 1.333 01-Feb-2015  christos The diagnostic function uvm_km_check_empty() takes a mutex, so don't call it
if we are using UVM_FLAG_NOWAIT.
 1.332 23-Jan-2015  chs skip busy anon pages in uvm_map_clean().
we shouldn't be messing with pages that someone else has busy,
and uvm_map_clean() is just advisory for amap mappings.
 1.331 26-Oct-2014  christos branches: 1.331.2;
Define UVMDEBUG for expensive debugging operations. Idea from chuq.
 1.330 18-Jul-2014  christos branches: 1.330.2;
Add MAP_INHERIT_ZERO
 1.329 18-Jul-2014  christos Split out the minherit code into separate functions for readability (allows
us to indent them properly), and merge the new vm_map_entry creation into
a common function to avoid code duplication. No functional change.
 1.328 05-Mar-2014  matt branches: 1.328.2;
Use UVMHIST_INITIALIZER (KERNHIST_INITIALIZER) to statically initialize
maphist. This allows maphist to used very very early in boot well before
uvm has been initialized.
 1.327 14-Nov-2013  martin As discussed on tech-kern: make TOPDOWN-VM runtime selectable per process
(offer MD code or emulations to override it).
 1.326 25-Oct-2013  martin Mark diagnostic-only variables
 1.325 25-Oct-2013  martin Some pmaps may not consume all arguments of pmap_copy()
 1.324 02-Nov-2012  matt branches: 1.324.2;
When uvm_io reserves kernel address space, make sure it's starts with the
same color as the user address space being copied.
 1.323 29-Oct-2012  para get rid of not used uvm_map flag (UVM_MAP_KMAPENT)
 1.322 04-Sep-2012  matt branches: 1.322.2;
Remove locking since it isn't needed. As soon as the 2nd uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the pmap_growkernel call in
uvm_km_mem_alloc will never be called again.
 1.321 03-Sep-2012  matt Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting there
unused.
 1.320 03-Sep-2012  matt Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.
 1.319 18-Aug-2012  chs avoid leaking a uvm_object reference when merging a new map entry
with the entries on both sides. fixes PR 46807.
 1.318 30-Jul-2012  matt -fno-common broke kernhist since it used commons.
Add a KERNHIST_DEFINE which is define the kernel history.
Change UVM to deal with the new usage.
 1.317 08-Apr-2012  martin Rework posix_spawn locking and memory management:
- always provide a vmspace for the new proc, initially borrowing from proc0
(this part fixes PR 46286)
- increase parallelism between parent and child if arguments allow this,
avoiding a potential deadlock on exec_lock
- add a new flag for userland to request old (lockstepped) behaviour for
better error reporting
- adapt test cases to the previous two and add a new variant to test the
diagnostics flag
- fix a few memory (and lock) leaks
- provide netbsd32 compat
 1.316 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.315 20-Feb-2012  bouyer When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
 1.314 19-Feb-2012  rmind Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".
 1.313 12-Feb-2012  martin branches: 1.313.2;
Fix another merge botch - bracket vm space assignement with kpreempt-
disable/enable.
 1.312 28-Jan-2012  rmind pool_page_alloc, pool_page_alloc_meta: avoid extra compare, use const.
ffs_mountfs,sys_swapctl: replace memset with kmem_zalloc.
sys_swapctl: move kmem_free outside the lock path.
uvm_init: fix comment, remove pointless numeration of steps.
uvm_map_enter: remove meflagval variable.
Fix some indentation.
 1.311 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.310 05-Jan-2012  reinoud Revert MAP_NOSYSCALLS patch.
 1.309 22-Dec-2011  reinoud Redo uvm_map_setattr() to never fail and remove the possible panic. The
possibility of failure was a C&P error.
 1.308 20-Dec-2011  reinoud Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..
 1.307 20-Dec-2011  yamt comment and assertion
 1.306 23-Nov-2011  matt branches: 1.306.2;
When allocating pages for kernel map entries and PMAP_ALLOC_POOLPAGE is
defined, use it. (allows a MIPS N32 kernel to boot when there is memory
outside of KSEG0).
 1.305 27-Sep-2011  jym branches: 1.305.2;
Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html
 1.304 01-Sep-2011  matt Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.
 1.303 06-Aug-2011  rmind - Rework uvm_anfree() into uvm_anon_freelst(), which always drops the lock.
- Free anons in uvm_anon_freelst() without lock held.
- Mechanic sync to unused loaning code.
 1.302 30-Jul-2011  martin Make uvmspace_exec() deal with procs that have no vmspace (yet) at all.
Greatly simplifies the upcoming posix_spawn implementation.
 1.301 30-Jul-2011  martin Get rid of #ifdef __sparc__ in uvm code - as noted by cgd back 1996,
now that we have __HAVE_CPU_VMSPACE_EXEC/cpu_vmspace_exec().
 1.300 05-Jul-2011  yamt - fix a use-after-free bug in uvm_km_free.
(after uvm_km_pgremove frees pages, the following pmap_remove touches them.)
- acquire the object lock for operations on pmap_kernel as it can actually be
raced with P->V operations. eg. pagedaemon.
 1.299 13-Jun-2011  rmind uvm_map_lock_entry: fix the order of locking. Spotted by yamt@.
Also, keep uvm_map_unlock_entry() symmetric.
 1.298 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.297 17-May-2011  mrg branches: 1.297.2;
move and rename the uvm history code out of uvm_stat to "kernhist".

rename "UVMHIST" option to enable the uvm histories.

TODO:
- make UVMHIST properly depend upon KERNHIST
- enable dynamic registration of histories. this is mostly just
allocating something in a bitmap, and is only for viewing multiple
histories in a merged form.


tested on amd64 and sparc64.
 1.296 08-Apr-2011  yamt comment
 1.295 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.
 1.294 04-Jan-2011  matt branches: 1.294.2; 1.294.4;
Add a MD hook to indicate a change of vmspace due to exec. (This is useful
to update any cpu flag due to a change to/from a 64bit and a 32bit address
space). This can set the state needed for copyout/copyin before setregs
is invoked.
 1.293 24-Sep-2010  rmind Fixes/improvements to RB-tree implementation:
1. Fix inverted node order, so that negative value from comparison operator
would represent lower (left) node, and positive - higher (right) node.
2. Add an argument (i.e. "context"), passed to comparison operators.
3. Change rb_tree_insert_node() to return a node - either inserted one or
already existing one.
4. Amend the interface to manipulate the actual object, instead of the
rb_node (in a similar way as Patricia-tree interface does).
5. Update all RB-tree users accordingly.

XXX: Perhaps rename rb.h to rbtree.h, since cleaning-up..

1-3 address the PR/43488 by Jeremy Huddleston.

Passes RB-tree regression tests.
Reviewed by: matt@, christos@
 1.292 22-Jun-2010  rmind Keep the lock around pmap_update() where required. While fixing this
in ubc_fault(), rework logic to "remember" the last object of page and
reduce locking overhead, since in common case pages belong to one and
the same UVM object (but not always, therefore add a comment).

Unlocks before pmap_update(), on removal of mappings, might cause TLB
coherency issues, since on architectures like x86 and mips64 invalidation
IPIs are deferred to pmap_update(). Hence, VA space might be globally
visible before IPIs are sent or while they are still in-flight.

OK ad@.
 1.291 14-May-2010  cegger Move PMAP_KMPAGE to be used in pmap_kenter_pa flags argument.
'Looks good to me' gimpy@

Forgot to commit this piece in previous commit
 1.290 21-Feb-2010  drochner branches: 1.290.2;
rename the va0_disabled option and cpp conditional to "disable" as well,
for consistency, and document option and sysctl flag
 1.289 20-Feb-2010  drochner rename the new sysctl to "vm.user_va0_disable", for consistency
with the majority of existing sysctl flags, suggested by yamt
 1.288 18-Feb-2010  drochner Disable mapping of virtual address 0 by user programs per default.
This blocks an easy exploit of kernel bugs leading to dereference
of a NULL pointer on some architectures (eg i386).
The check can be disabled in various ways:
-by CPP definitions in machine/types.h (portmaster's choice)
-by a kernel config option USER_VA0_DISABLED_DEFAULT=0
-at runtime by sysctl vm.user_va0_disabled (cannot be cleared
at securelevel>0)
 1.287 08-Feb-2010  joerg Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.
 1.286 15-Dec-2009  matt branches: 1.286.2;
Use PRIxVADDR... (change a printf/panic -> panic)
 1.285 14-Dec-2009  matt Use PRIxVADDR ...
 1.284 07-Nov-2009  cegger Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.283 01-Nov-2009  uebayasi Consistently call amap / uobj layers as upper / lower, because UVM has only
those two layers by design. Approved by Chuck Cranor some time ago.
 1.282 06-Sep-2009  rmind uvmspace_unshare: #if 0-out this function. Q: perhaps remove?
AFAIK it was not used for 11 years.
 1.281 19-Aug-2009  matt In uvm_kmapent_alloc, Make sure entry is initialized.
Spotted by msaitoh.
 1.280 18-Aug-2009  thorpej Move uvm_object-related DDB hooks into uvm_object.c. Put all of the
uvm_map-related DDB stuff in one spot in the file.
 1.279 18-Aug-2009  thorpej Move uvm_page-related DDB hooks into uvm_page.c.
 1.278 13-Aug-2009  matt Fix KASSERT() failure reported by Geoff Wing.
 1.277 10-Aug-2009  matt Compare vaddr_t against 0, not NULL.
 1.276 09-Aug-2009  matt If PMAP_MAP_POOLPAGE is defined, use it to map kernel map entries. This
avoids TLB pollution on those platforms that define it.
 1.275 01-Aug-2009  yamt - uvm_map_extract: update map->size correctly for !UVM_EXTRACT_CONTIG.
- uvm_map_extract: panic on zero-sized entries.
- make uvm_map_replace static.
 1.274 01-Aug-2009  yamt don't call uvm_map_check with map unlocked.
 1.273 01-Aug-2009  yamt _uvm_tree_sanity: fix an assertion.
 1.272 01-Aug-2009  yamt _uvm_map_sanity: fix a race which causes "stale hint".
 1.271 10-Jun-2009  yamt on MADV_WILLNEED, start prefetching backing object's pages.
 1.270 03-May-2009  pooka Include some debug print routines if DEBUGPRINT is defined. This
way they can be included without having to include DDB.
(arguably all print routines should be behind #ifdef DEBUGPRINT
and options DDB should define that macro, but I'll tackle that later)
 1.269 13-Jan-2009  yamt branches: 1.269.2;
vm_map_locked_p: add comments
 1.268 20-Dec-2008  ad Move a couple of calls to pmap_update().
 1.267 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.266 16-Dec-2008  christos replace bitmask_snprintf(9) with snprintb(3)
 1.265 13-Dec-2008  ad It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap
 1.264 01-Dec-2008  ad PR port-amd64/32816 amd64 can not load lkms

Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.
 1.263 29-Jul-2008  matt branches: 1.263.2; 1.263.4;
Make uvm_map.? use <sys/rb.h> instead of <sys/tree.h>. Change the
ambiguous members ownspace/space to gap/maxgap. Add some evcnt for
evaluation of lookups using tree/list. Drop threshold of using
tree for lookups from > 30 to > 15.

Bump kernel version to 4.99.71
 1.262 16-Jul-2008  matt Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.
 1.261 11-Jul-2008  skrll English improvement in comments.

"seems good to me :)" from yamt.
 1.260 06-Jun-2008  ad branches: 1.260.2; 1.260.4;
Back out previous.
 1.259 06-Jun-2008  ad Wrap an expensive check in DIAGNOSTIC.
 1.258 04-Jun-2008  ad - Switch off the map evcnts by default.
- SAVE_HINT() doesn't need to be atomic.
 1.257 04-Jun-2008  ad vm_page: put TAILQ_ENTRY into a union with LIST_ENTRY, so we can use both.
 1.256 02-Jun-2008  ad Don't needlessly acquire v_interlock.
 1.255 31-May-2008  ad Missing cv_destroy().
 1.254 27-Apr-2008  ad branches: 1.254.2; 1.254.4;
Disable preemption while swapping pmap.
 1.253 26-Apr-2008  yamt fix a locking botch. PR/38415 from Wolfgang Solfrank.
 1.252 04-Mar-2008  yamt branches: 1.252.2;
fix "stale map" assertions. PR/38153 from Sarton O'Brien.
 1.251 23-Feb-2008  chris Add some more missing pmap_update()s following pmap_kremove()s.
 1.250 18-Jan-2008  yamt branches: 1.250.2; 1.250.6;
push pmap_clear_reference calls into pdpolicy code, where reference bits
actually matter.
 1.249 08-Jan-2008  yamt simplify locking and remove vm_map_upgrade/downgrade.
this fixes a deadlock due to read-lock recursion of map->lock.
 1.248 02-Jan-2008  ad Merge vmlocking2 to head.
 1.247 13-Dec-2007  yamt add ddb "whatis" command. inspired from solaris ::whatis dcmd.
 1.246 26-Nov-2007  xtraeme branches: 1.246.2; 1.246.4; 1.246.6;
Make this build without LOCKDEBUG (the if statement that uses
LOCKDEBUG_MEM_CHECK).
 1.245 26-Nov-2007  yamt uvm_map_extract: for UVM_EXTRACT_QREF, mark entries UVM_MAP_NOMERGE.
 1.244 26-Nov-2007  yamt uvm_unmap1: LOCKDEBUG_MEM_CHECK for kernel_map.
 1.243 15-Oct-2007  yamt branches: 1.243.4;
uvm_map_reserve: don't ignore alignment. fixes mremap.
 1.242 12-Oct-2007  skrll Don't restrict the offset when allocating a map entry for in-kernel map -
use UVM_UNKNOWN_OFFSET in the call to uvm_map_prepare.

This fixes a '"panic: malloc: out of space in kmem_map" when it's not
really' testcase of mine, and one reported to me by chuq. This is likely
to fix PR/35587 as well.

Looks/seems fine to me from chuq and yamt. Thanks.
 1.241 10-Oct-2007  ad Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.240 20-Aug-2007  ad branches: 1.240.2; 1.240.4;
Also initialize map->lock for INTRSAFE maps.
 1.239 20-Aug-2007  ad uvmspace_free: destroy locks.
 1.238 21-Jul-2007  ad branches: 1.238.4; 1.238.6;
Merge unobtrusive locking changes from the vmlocking branch.
 1.237 09-Jul-2007  ad branches: 1.237.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.236 12-Mar-2007  ad branches: 1.236.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.235 04-Mar-2007  christos branches: 1.235.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.234 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.233 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.232 01-Nov-2006  yamt branches: 1.232.2; 1.232.4; 1.232.6;
remove some __unused from function parameters.
 1.231 26-Oct-2006  uwe uvm_page_printall: With new PQ_* flags pg->pqflags no longer fits and
makes the output of "show all pages" ragged. Widen the field to 4 chars.
 1.230 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.229 16-Sep-2006  yamt branches: 1.229.2;
revert a change which was unintentionally slipped in via yamt-pdpolicy branch.
 1.228 15-Sep-2006  yamt merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.227 25-Jun-2006  yamt branches: 1.227.4;
make amap use kmem_alloc, rather than malloc.
(ie. make it use kernel_map, rather than kmem_map.)
kmem_map is more restricted than kernel_map,
and there's no point for amap to use it.
 1.226 25-May-2006  yamt branches: 1.226.2;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html
 1.225 20-May-2006  elad Better implementation of PaX MPROTECT, after looking some more into the
code and not trying to use temporary solutions.

Lots of comments and help from YAMAMOTO Takashi, also thanks to the PaX
author for being quick to recognize that something fishy's going on. :)

Hook up in mmap/vmcmd rather than (ugh!) uvm_map_protect().

Next time I suggest to commit a temporary solution just revoke my
commit bit.
 1.224 16-May-2006  elad branches: 1.224.2;
Introduce PaX MPROTECT -- mprotect(2) restrictions used to strengthen
W^X mappings.

Disabled by default.

First proposed in:

http://mail-index.netbsd.org/tech-security/2005/12/18/0000.html

More information in:

http://pax.grsecurity.net/docs/mprotect.txt

Read relevant parts of options(4) and sysctl(3) before using!

Lots of thanks to the PaX author and Matt Thomas.
 1.223 14-May-2006  elad integrate kauth.
 1.222 14-May-2006  yamt - rename uvm_tree_sanity to uvm_map_check and add some
(non tree related) checks.
- remove treesanity_label. instead, just panic if any corruption is detected.
 1.221 14-May-2006  yamt - uvm_mapent_trymerge: don't forget to update hints.
- clear_hints: new function.
- uvm_map_replace: use clear_hints. no functional change.
- add some assertions.
 1.220 14-May-2006  yamt update first_free correctly.
 1.219 03-May-2006  yamt uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).
 1.218 21-Apr-2006  yamt - share some code between uvm_map_clip_end and uvm_map_clip_start.
- add a map entry sanity-check function, uvm_mapent_check().

discussed on source-changes@.
 1.217 13-Apr-2006  christos Coverity CID 762: Protect against NULL dereferencing entry->object.uvm_obj
like we do a few lines before. Maybe all the tests should be changed
to UVM_ET_ISOBJ(), or the macro should do it internally?
 1.216 15-Mar-2006  drochner branches: 1.216.2;
-clean up the interface to uvm_fault: the "fault type" didn't serve
any purpose (done by a macro, so we don't save any cycles for now)
-kill vm_fault_t; it is not needed for real faults, and for simulated
faults (wiring) it can be replaced by UVM internal flags
-remove <uvm/uvm_fault.h> from uvm_extern.h again
 1.215 01-Mar-2006  yamt branches: 1.215.2; 1.215.4;
merge yamt-uio_vmspace branch.

- use vmspace rather than proc or lwp where appropriate.
the latter is more natural to specify an address space.
(and less likely to be abused for random purposes.)
- fix a swdmover race.
 1.214 22-Feb-2006  bjh21 Include page ownership information in the output of the DDB "show all pages"
command if UVM_PAGE_TRKOWN is enabled.
 1.213 19-Feb-2006  bjh21 Add a "show all pages" command to DDB which prints one line per physical
page in the system. Useful for getting some idea where all your memory's
gone, at least on a sufficiently small system.
 1.212 15-Feb-2006  yamt - amap_copy: take a "flags" argument instead of booleans.
- add AMAP_COPY_NOMERGE flag, and use it for uvm_map_extract.
PR/32806 from Julio M. Merino Vidal.
 1.211 11-Feb-2006  yamt remove the following options. no objections on tech-kern@.

UVM_PAGER_INLINE
UVM_AMAP_INLINE
UVM_PAGE_INLINE
UVM_MAP_INLINE
 1.210 21-Jan-2006  yamt branches: 1.210.2; 1.210.4;
implement compat_linux mremap.
 1.209 21-Jan-2006  yamt uvm_map_replace: remove a wrong comment.
 1.208 15-Jan-2006  yamt make some debug statistics evcnt.
 1.207 08-Jan-2006  yamt clean up uvm_map evcnt code. no functional changes.
 1.206 24-Dec-2005  perry branches: 1.206.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.205 11-Dec-2005  christos merge ktrace-lwp.
 1.204 28-Jun-2005  thorpej branches: 1.204.2;
Clean up the cpp macro used to say "we're compiling this specific C file".
 1.203 28-Jun-2005  thorpej Clean up the use of __inline in this file. In particular, don't inline
really big chunks of code. This saves almost 2.5K on a GENERIC i386
kernel, and has the added benefit of not polluting the I$ so much.
 1.202 13-Jun-2005  jmc Change signature of uvm_kmapent_map defintiion to __INLINE to match prototype
 1.201 10-Jun-2005  dsl If we are builging a small kernel [1], don't inline all these functions.
Saves over 2k and lets i386 rescue_tiny build again.
[1] if MALLOC_NOINLINE is defined - not ideal but...
 1.200 02-Jun-2005  matt When writing coredumps, don't write zero uninstantiated demand-zero pages.
Also, with ELF core dumps, trim trailing zeroes from sections. These two
changes can shrink coredumps by over 50% in size.
 1.199 29-May-2005  christos avoid shadow variables.
remove unneeded casts.
 1.198 22-May-2005  yamt uvm_kmapent_free: add missing vm_map_lock/unlock.
 1.197 18-May-2005  yamt uvm_mapent_trymerge: adjust object offset when necessary.
 1.196 18-May-2005  yamt redo the previous (uvm_map.c rev.1.195) correctly.
 1.195 17-May-2005  yamt uvm_mapent_trymerge: add missing checks.
 1.194 17-May-2005  yamt (try to) merge map entries in fault handler.
 1.193 17-May-2005  yamt revert uvm_map.c rev.1.190 in favor of merging in fault handler.
 1.192 11-May-2005  yamt allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.191 05-May-2005  yamt - amap_extend: don't extend amap beyond UVM_AMAP_LARGE.
- uvm_map_enter: if we fail to extend amap, just give up merging instead of
bailing out immediately.
 1.190 29-Apr-2005  yamt uvm_map_enter: don't bother to defer amap allocation if there's a mergable
existing entry. although there're merits and demerits, i think it benefits
common cases.
 1.189 28-Apr-2005  yamt uvm_map: don't leak a preallocated map entry on error.
 1.188 07-Apr-2005  dbj use voff_t instead of vaddr_t to hold file offset passed to pgo_put
 1.187 01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.186 28-Feb-2005  chs branches: 1.186.2;
add back rev. 1.29 of vm/vm_map.c, which was apparently lost in the UVM merge:
msync(MS_INVALIDATE) should fail if any part of the region is wired.
 1.185 26-Feb-2005  perry nuke trailing whitespace
 1.184 11-Feb-2005  chs use vm_map_{min,max}() instead of dereferencing the vm_map pointer directly.
define and use vm_map_set{min,max}() for modifying these values.
remove the {min,max}_offset aliases for these vm_map fields to be more
namespace-friendly. PR 26475.
 1.183 23-Jan-2005  chs branches: 1.183.2;
pmap_wired_count() is now available on all platforms,
remove the code for the case where it's not defined.
 1.182 17-Jan-2005  atatat Convert the PMAP_PREFER() macro from two arguments (offset and hint)
to four (adding size and direction).

In order for topdown uvm to be an option on ports using PMAP_PREFER,
they will need to "prefer" lower addresses if topdown is being used.
Additionally, at least one port also needs to know the size.
 1.181 14-Jan-2005  yamt branches: 1.181.2;
don't use uvm_kmapent_alloc for non-intrsafe kernel submaps
(namely exec_map and phys_map) becuase:
- normal vmmpepl is fine for them.
- some of them are tightly sized. eg. size of exec_map on vax is just NCARGS.

should fix vax boot failure reported by Johnny Billquist on current-users@.
 1.180 13-Jan-2005  yamt in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.
 1.179 12-Jan-2005  yamt don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.
 1.178 03-Jan-2005  yamt reapply uvm_map.c rev.1.156 (use a zero-sized array instead of
c99 flexible array member) for ports which still use gcc 2.95.
from Havard Eidnes.
 1.177 01-Jan-2005  yamt uvm_unmap_remove: debug check to ensure that
unmapped regions doesn't have any remaining page mappings.
 1.176 01-Jan-2005  yamt don't merge incompatible map entries. eg. private and shared.
 1.175 01-Jan-2005  yamt introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.
 1.174 01-Jan-2005  yamt for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.
 1.173 25-Sep-2004  yamt uvm_map_printit:
- print wired_count if available.
- fix a printf format.
 1.172 19-May-2004  he Move variable declaration up before the code. Fixes compile error
for vax, and also conforms better to KNF.
 1.171 04-May-2004  pk Since a `vmspace' always includes a `vm_map' we can re-use vm_map's
reference count lock to also protect the vmspace's reference count.
 1.170 03-May-2004  petrov Revert default uvm counters, rename UVMMAP_COUNTERS to UVMMAP_NOCOUNTERS.
 1.169 01-May-2004  petrov Replace uvm counters with evcnt, initialize them through __link_set (from Matt Thomas),
disable counters by default and add configuration option UVMMAP_COUNTERS.
 1.168 27-Apr-2004  junyoung Fix typo in comments.
 1.167 27-Apr-2004  junyoung FINDSPACE_FIXED -> UVM_FLAG_FIXED in comment.
 1.166 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.165 30-Mar-2004  yamt uvm_map_findspace: don't return unaligned address if alignment is specified.
discussed on tech-kern@.
 1.164 24-Mar-2004  junyoung branches: 1.164.2;
Drop trailing spaces.
 1.163 17-Mar-2004  mycroft Something I posted to tech-kern a long time ago...
Slightly simplify uvm_map_extract() slightly by eliminating "oldstart".
 1.162 11-Mar-2004  pooka Reflect dropping mappings in map_size.
Avoids panic on DIAGNOSTIC kernels.

ok by chs
 1.161 10-Feb-2004  matt Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.
 1.160 09-Feb-2004  yamt - borrow vmspace0 in uvm_proc_exit instead of uvmspace_free.
the latter is not a appropriate place to do so and it broke vfork.
- deactivate pmap before calling cpu_exit() to keep a balance of
pmap_activate/deactivate.
 1.159 07-Feb-2004  yamt introduce a new patchable variable, uvm_debug_check_rbtree,
which is zero by default.
perform rbtree sanity checks only when it isn't zero
because the check is very heavy weight especially when
there're many entries.
 1.158 07-Feb-2004  yamt don't deactivate pmap in exit1 because we'll touch the pmap later.
instead, borrow vmspace0 immediately before destroying the pmap
in uvmspace_free.
 1.157 07-Feb-2004  yamt uvm_kmapent_alloc:
in the case that there's no cached entries,
if kmem_map is already up, allocate a entry from it
so that we won't try to vm_map_lock recursively.
XXX assuming usage pattern of kmem_map.
 1.156 02-Feb-2004  he Since the playstation2 port still uses a variant of gcc 2.95.2,
change to use a zero-sized array instead of c99 flexible array
member in a struct.

OK'ed by yamt.
 1.155 30-Jan-2004  yamt remove wrong assertions.
sparc's alloc_cpuinfo_global_va() partially unmaps kva range in kernel_map.

noted by Juergen Hannken-Illjes on current-users@.
 1.154 29-Jan-2004  yamt some English fixes from Soren Jacobsen.
 1.153 29-Jan-2004  yamt - split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.
 1.152 19-Dec-2003  simonb Unindent a code block that doens't need to be indented.
 1.151 13-Nov-2003  chs two changes in improve scalability:

(1) split the single list of pages allocated to a pool into three lists:
completely full, partially full, and completely empty.
there is no longer any need to traverse any list looking for a
certain type of page.

(2) replace the 8-element hash table for out-of-page page headers
with a splay tree.

these two changes (together with the recent enhancements to the wait code)
give us linear scaling for a fork+exit microbenchmark.
 1.150 06-Nov-2003  yamt fix wrong assertions.
they can be false due to alignment requiments (and PMAP_PREFER).
 1.149 05-Nov-2003  yamt don't move hint backward.
 1.148 05-Nov-2003  yamt - fix a reversed comparison.
- fix "nextgap" case.
- make sure don't get addresses behind hint.
- deal with integer wraparounds better.
- assertions.
 1.147 02-Nov-2003  yamt fix a wrong assertion. pointed by Christian Limpach.
 1.146 01-Nov-2003  yamt - update uvm_map::size fewer places.
- add related assertions.
 1.145 01-Nov-2003  yamt commit rest of the previous (rbtree).

(i should check .rej files before commit, sorry)
 1.144 01-Nov-2003  yamt track map entries and free spaces using red-black tree
to improve scalability of operations on the map.

originally done by Niels Provos for OpenBSD.
tweaked for NetBSD by me with some advices from enami tsugutomo.
discussed on tech-kern@ and tech-perform@.
 1.143 25-Oct-2003  junyoung KNF.
 1.142 09-Oct-2003  enami Fix indent.
 1.141 09-Oct-2003  atatat When pulling back an amap to cover the new allocation along with the
previous entry, don't add the size to the extension -- it's already
been added to the end of the previous entry.
 1.140 02-Oct-2003  enami Rewrite uvm_map_findspace() to improve readability and to fix a bug that
it may return space already in use as free space under some condition.
The symptom of the bug is that exec fails if stack is unlimited on
topdown VM kernel.
 1.139 01-Oct-2003  enami Some whitespace fixes.
 1.138 01-Oct-2003  enami ansi'fy.
 1.137 26-Aug-2003  yamt use VM_PAGE_TO_PHYS macro instead of using phys_addr directly.
 1.136 09-Apr-2003  thorpej branches: 1.136.2;
In uvm_map_clean(), only call pgo_put if the object has one.
From Quentin Garnier <quatriemek.com!netbsd>.
 1.135 02-Mar-2003  matt In uvm_map_space, if the current entry is above the new space use the
previous entry. (not if the current entry starts at the end of the new
space; that case doesn't take into account if the new space had a specified
alignment).
 1.134 02-Mar-2003  matt When finding an aligned block, we need to truncate in topdown, not roundup.
 1.133 23-Feb-2003  simonb Remove assigned-to but not used variable.
 1.132 21-Feb-2003  matt fix a tpyo in a comment.
 1.131 20-Feb-2003  atatat Introduce "top down" memory management for mmap()ed allocations. This
means that the dynamic linker gets mapped in at the top of available
user virtual memory (typically just below the stack), shared libraries
get mapped downwards from that point, and calls to mmap() that don't
specify a preferred address will get mapped in below those.

This means that the heap and the mmap()ed allocations will grow
towards each other, allowing one or the other to grow larger than
before. Previously, the heap was limited to MAXDSIZ by the placement
of the dynamic linker (and the process's rlimits) and the space
available to mmap was hobbled by this reservation.

This is currently only enabled via an *option* for the i386 platform
(though other platforms are expected to follow). Add "options
USE_TOPDOWN_VM" to your kernel config file, rerun config, and rebuild
your kernel to take advantage of this.

Note that the pmap_prefer() interface has not yet been modified to
play nicely with this, so those platforms require a bit more work
(most notably the sparc) before they can use this new memory
arrangement.

This change also introduces a VM_DEFAULT_ADDRESS() macro that picks
the appropriate default address based on the size of the allocation or
the size of the process's text segment accordingly. Several drivers
and the SYSV SHM address assignment were changed to use this instead
of each one picking their own "default".
 1.130 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.129 21-Jan-2003  christos finally: step 5: disable a KASSERT() if we are doing_shutdown.
now sync from ddb should work as badly as before the nathanw_sa merge.
 1.128 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.127 11-Dec-2002  thorpej UVM_KMF_NOWAIT -> UVM_FLAG_NOWAIT
 1.126 30-Nov-2002  bouyer Change uvm_km_kmemalloc() to accept flag UVM_KMF_NOWAIT and pass it to
uvm_map(). Change uvm_map() to honnor UVM_KMF_NOWAIT. For this, change
amap_extend() to take a flags parameter instead of just boolean for
direction, and introduce AMAP_EXTEND_FORWARDS and AMAP_EXTEND_NOWAIT flags
(AMAP_EXTEND_BACKWARDS is still defined as 0x0, to keep the code easier to
read).
Add a flag parameter to uvm_mapent_alloc().
This solves a problem a pool_get(PR_NOWAIT) could trigger a pool_get(PR_WAITOK)
in uvm_mapent_alloc().
Thanks to Chuck Silvers, enami tsugutomo, Andrew Brown and Jason R Thorpe
for feedback.
 1.125 14-Nov-2002  atatat Implement backwards extension of amaps. There are three cases to deal
with:

Case #1 -- adjust offset: The slot offset in the aref can be
decremented to cover the required size addition.

Case #2 -- move pages and adjust offset: The slot offset is not large
enough, but the amap contains enough inactive space *after* the mapped
pages to make up the difference, so active slots are slid to the "end"
of the amap, and the slot offset is, again, adjusted to cover the
required size addition. This optimizes for hitting case #1 again on
the next small extension.

Case #3 -- reallocate, move pages, and adjust offset: There is not
enough inactive space in the amap, so the arrays are reallocated, and
the active pages are copied again to the "end" of the amap, and the
slot offset is adjusted to cover the required size. This also
optimizes for hitting case #1 on the next backwards extension.

This provides the missing piece in the "forward extension of
vm_map_entries" logic, so the merge failure counters have been
removed.

Not many applications will make any use of this at this time (except
for jvms and perhaps gcc3), but a "top-down" memory allocator will use
it extensively.
 1.124 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.123 24-Oct-2002  atatat In the case of a double amap_extend() (during a forward merge after a
back merge), don't abort the allocation if the second extend fails,
just abort the forward merge and finish the allocation.

Code reviewed by thorpej.
 1.122 24-Oct-2002  atatat Call amap_extend() a second time in the case of a bimerge (both
backwards and forwards) if the previous entry was backed by an amap.

Fixes pr kern/18789, where netscape 7 + a java applet actually manage
to incur forward and bimerges in userspace.

Code reviewed by fvdl and thorpej.
 1.121 18-Oct-2002  atatat Add an implementation of forward merging of new map entries. Most new
allocations can be merged either forwards or backwards, meaning no new
entries will be added to the list, and some can even be merged in both
directions, resulting in a surplus entry.

This code typically reduces the number of map entries in the
kernel_map by an order of magnitude or more. It also makes possible
recovery from the pathological case of "5000 processes created and
then killed", which leaves behind a large number of map entries.

The only forward merge case not covered is the instance of an amap
that has to be extended backwards (WIP). Note that this only affects
processes, not the kernel (the kernel doesn't use amaps), and that
merge opportunities like this come up *very* rarely, if at all. Eg,
after being up for eight days, I see only three failures in this
regard, and even those are most likely due to programs I'm developing
to exercise this case.

Code reviewed by thorpej, matt, christos, mrg, chuq, chuck, perry,
tls, and probably others. I'd like to thank my mother, the Hollywood
Foreign Press...
 1.120 22-Sep-2002  chs add a new flag VM_MAP_DYING, which is set before we start
tearing down a vm_map. use this to skip the pmap_update()
at the end of all the removes, which allows pmaps to optimize
pmap tear-down. also, use the new pmap_remove_all() hook to
let the pmap implemenation know what we're up to.
 1.119 15-Sep-2002  chs add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.
 1.118 08-Mar-2002  thorpej branches: 1.118.2; 1.118.8;
Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.117 31-Dec-2001  chs introduce a new UVM fault type, VM_FAULT_WIREMAX. this is different
from VM_FAULT_WIRE in that when the pages being wired are faulted in,
the simulated fault is at the maximum protection allowed for the mapping
instead of the current protection. use this in uvm_map_pageable{,_all}()
to fix the problem where writing via ptrace() to shared libraries that
are also mapped with wired mappings in another process causes a
diagnostic panic when the wired mapping is removed.

this is a really obscure problem so it deserves some more explanation.
ptrace() writing to another process ends up down in uvm_map_extract(),
which for MAP_PRIVATE mappings (such as shared libraries) will cause
the amap to be copied or created. then the amap is made shared
(ie. the AMAP_SHARED flag is set) between the kernel and the ptrace()d
process so that the kernel can modify pages in the amap and have the
ptrace()d process see the changes. then when the page being modified
is actually faulted on, the object pages (from the shared library vnode)
is copied to a new anon page and inserted into the shared amap.
to make all the processes sharing the amap actually see the new anon
page instead of the vnode page that was there before, we need to
invalidate all the pmap-level mappings of the vnode page in the pmaps
of the processes sharing the amap, but we don't have a good way of
doing this. the amap doesn't keep track of the vm_maps which map it.
so all we can do at this point is to remove all the mappings of the
page with pmap_page_protect(), but this has the unfortunate side-effect
of removing wired mappings as well. removing wired mappings with
pmap_page_protect() is a legitimate operation, it can happen when a file
with a wired mapping is truncated. so the pmap has no way of knowing
whether a request to remove a wired mapping is normal or when it's due to
this weird situation. so the pmap has to remove the weird mapping.
the process being ptrace()d goes away and life continues. then,
much later when we go to unwire or remove the wired vm_map mapping,
we discover that the pmap mapping has been removed when it should
still be there, and we panic.

so where did we go wrong? the problem is that we don't have any way
to update just the pmap mappings that need to be updated in this
scenario. we could invent a mechanism to do this, but that is much
more complicated than this change and it doesn't seem like the right
way to go in the long run either.

the real underlying problem here is that wired pmap mappings just
aren't a good concept. one of the original properties of the pmap
design was supposed to be that all the information in the pmap could
be thrown away at any time and the VM system could regenerate it all
through fault processing, but wired pmap mappings don't allow that.
a better design for UVM would not require wired pmap mappings,
and Chuck C. and I are talking about this, but it won't be done
anytime soon, so this change will do for now.

this change has the effect of causing MAP_PRIVATE mappings to be
copied to anonymous memory when they are mlock()d, so that uvm_fault()
doesn't need to copy these pages later when called from ptrace(), thus
avoiding the call to pmap_page_protect() and the panic that results
from this when the mlock()d region is unlocked or freed. note that
this change doesn't help the case where the wired mapping is MAP_SHARED.

discussed at great length with Chuck Cranor.
fixes PRs 10363, 12554, 12604, 13041, 13487, 14580 and 14853.
 1.116 31-Dec-2001  chs in uvm_map_clean(), add PGO_CLEANIT to the flags passed to an object's pager.
we need to make sure that vnode pages are written to disk at least once,
otherwise processes could gain access to whatever data was previously stored
in disk blocks which are freshly allocated to a file.
 1.115 31-Dec-2001  chs fix locking for loaning. in general we should be looking at the page's
uobject and uanon pointers rather than at the PQ_ANON flag to determine
which lock to hold, since PQ_ANON can be clear even when the anon's lock
is the one which we should hold (if the page was loaned from an object
and then freed by the object).
 1.114 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.113 06-Nov-2001  chs don't call pmap_copy() from uvmspace_fork().
a new process is very likely to call execve() immediately after fork(),
so most of the time copying the pmap mappings is wasted effort.
 1.112 30-Oct-2001  thorpej uvm_map_protect(): Don't allow VM_PROT_EXECUTE to be set on entries
(either the current protection or the max protection) that reference
vnodes associated with a file system mounted with the NOEXEC option.

uvm_mmap(): Don't allow PROT_EXEC mappings to be established of vnodes
which are associated with a file system mounted with the NOEXEC option.
 1.111 30-Oct-2001  thorpej Correct a comment.
 1.110 30-Oct-2001  thorpej - Add a new vnode flag VEXECMAP, which indicates that a vnode has
executable mappings. Stop overloading VTEXT for this purpose (VTEXT
also has another meaning).
- Rename vn_marktext() to vn_markexec(), and use it when executable
mappings of a vnode are established.
- In places where we want to set VTEXT, set it in v_flag directly, rather
than making a function call to do this (it no longer makes sense to
use a function call, since we no longer overload VTEXT with VEXECMAP's
meaning).

VEXECMAP suggested by Chuq Silvers.
 1.109 29-Oct-2001  thorpej uvm_mmap(): If a vnode mapping is established with PROT_EXEC, mark the
vnode as VTEXT.

uvm_map_protect(): When VM_PROT_EXECUTE is added to a VA range, mark
all the vnodes mapped by the range as VTEXT.
 1.108 23-Sep-2001  chs branches: 1.108.2;
make pmap_resident_count() non-optional.
 1.107 21-Sep-2001  chs add an assert.
 1.106 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.105 10-Sep-2001  chris Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.
 1.104 09-Sep-2001  chs create a new pool for map entries, allocated from kmem_map instead of
kernel_map. use this instead of the static map entries when allocating
map entries for kernel_map. this greatly reduces the number of static
map entries used and should eliminate the problems with running out.
 1.103 07-Sep-2001  lukem branches: 1.103.2;
let user know current value of MAX_KMAPENT in panic
 1.102 20-Aug-2001  wiz "wierd" is weird.
 1.101 16-Aug-2001  chs user maps are always pageable.
 1.100 22-Jul-2001  wiz seperate -> separate
 1.99 02-Jun-2001  chs branches: 1.99.2;
replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.98 25-May-2001  chs remove trailing whitespace.
 1.97 22-May-2001  ross Merge the swap-backed and object-backed inactive lists.
 1.96 29-Apr-2001  thorpej Implement page coloring, using a round-robin bucket selection
algorithm (Solaris calls this "Bin Hopping").

This implementation currently relies on MD code to define a
constant defining the number of buckets. This will change
reasonably soon (MD code will be able to dynamically size
the bucket array).
 1.95 24-Apr-2001  thorpej Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.
 1.94 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.93 11-Feb-2001  eeh branches: 1.93.2;
When recycling a vm_map, resize it to the new process address space limits.
 1.92 10-Feb-2001  thorpej Don't uvm_deallocate() the address space in exit1(). The address
space is already torn down in uvmspace_free() when the vmspace
refrence count reaches 0. Move the shmexit() call into uvmspace_free().

Note that there is a beneficial side-effect of deferring the unmap
to uvmspace_free() -- on systems where TLB invalidations are
particularly expensive, the unmapping of the address space won't
have to cause TLB invalidations; uvmspace_free() is going to be
run in a context other than the exiting process's, so the "pmap is
active" test will evaluate to FALSE in the pmap module.
 1.91 06-Feb-2001  eeh Specify a process' address space limits for uvmspace_exec().
 1.90 05-Feb-2001  chs in uvm_map_clean(), fix the case where the start offset is within the last
entry in the map. the old code would walk around the end of the linked list,
through the header entry, and keep going from the first map entry until it
found a gap in the map, at which point it would return an error. if the map
had no gaps then it would loop forever. reported by k-abe@cs.utah.edu.
while I'm here, clean up this function a bit.

also, use MIN() instead of min(), since the latter takes arguments of
type "int" but we're passing it values of type "vaddr_t", which can be
a larger size.
 1.89 28-Jan-2001  thorpej Page scanner improvements, behavior is actually a bit more like
Mach VM's now. Specific changes:
- Pages now need not have all of their mappings removed before being
put on the inactive list. They only need to have the "referenced"
attribute cleared. This makes putting pages onto the inactive list
much more efficient. In order to eliminate redundant clearings of
"refrenced", callers of uvm_pagedeactivate() must now do this
themselves.
- When checking the "modified" attribute for a page (for clearing
PG_CLEAN), make sure to only do it if PG_CLEAN is currently set on
the page (saves a potentially expensive pmap operation).
- When scanning the inactive list, if a page is referenced, reactivate
it (this part was actually added in uvm_pdaemon.c,v 1.27). This
now works properly now that pages on the inactive list are allowed to
have mappings.
- When scanning the inactive list and considering a page for freeing,
remove all mappings, and then check the "modified" attribute if the
page is marked PG_CLEAN.
- When scanning the active list, if the page was referenced since its
last sweep by the scanner, don't deactivate it. (This part was
actually added in uvm_pdaemon.c,v 1.28.)

These changes greatly improve interactive performance during
moderate to high memory and I/O load.
 1.88 14-Jan-2001  thorpej splimp() -> splvm()
 1.87 13-Dec-2000  enami Use single const char array instead of over 200 string constant.
 1.86 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.85 25-Nov-2000  chs lots of cleanup:
use queue.h macros and KASSERT().
address amap offsets in pages instead of bytes.
make amap_ref() and amap_unref() take an amap, offset and length
instead of a vm_map_entry_t.
improve whitespace and comments.
 1.84 16-Oct-2000  thorpej Back out rev. 1.83 -- it's causing problems with some pmap
implementations, so we'll have to spend a little more time
working on the problem.
 1.83 11-Oct-2000  thorpej - uvmspace_share(): If p2 has a vmspace already, make sure to deactivate
it and free it as appropriate. Activate p2's new address space once
it references p1's.
- uvm_fork(): Make sure the child's vmspace is NULL before calling
uvmspace_share() (the child doens't have one already in this case).

These changes do not change the behavior for the current use of
uvmspace_share() (vfork(2)), but make it possible for an already
running process (such as a kernel thread) to properly attach to
another process's address space.
 1.82 11-Oct-2000  thorpej - Change SAVE_HINT() to take a "check" value. This value is compared
to the contents of the hint in the map, and the hint saved in the
map only if the two values match. When an unconditional save is
required, the "check" value passed should be map->hint (and the
compiler will optimize the test away). When deleting a map entry,
the new SAVE_HINT() will only change the hint if the entry being
deleted was the hint value (thus preserving any meaningful hint
that may have been there previously, rather than stomping on it).
- Add a missing hint update when deleting the map entry in
uvm_map_entry_unlink(). This is the fix for kern/11125, from
ITOH Yasufumi <itohy@netbsd.org>.
 1.81 13-Sep-2000  thorpej Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.
 1.80 01-Aug-2000  wiz Rename VM_INHERIT_* to MAP_INHERIT_* and move them to sys/sys/mman.h as
discussed on tech-kern.
Retire sys/uvm/uvm_inherit.h, update man page for minherit(2).
 1.79 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.78 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.77 13-Jun-2000  chs branches: 1.77.2;
initialize aref.ar_pageoff even if there's no amap.
 1.76 05-Jun-2000  pk Change previous to use `vm_map_min(dstmap)' instead of hard-coding
VM_MIN_KERNEL_ADDRESS.
 1.75 02-Jun-2000  pk Let uvm_map_extract() set the lower bound on the kernel address range
itself, in stead of having its callers do that.
 1.74 19-May-2000  thorpej branches: 1.74.2;
Clean up some indentation lossage in uvm_map_extract().
 1.73 24-Apr-2000  thorpej Changes necessary to implement pre-zero'ing of pages in the idle loop:
- Make page free lists have two actual queues: known-zero pages and
pages with unknown contents.
- Implement uvm_pageidlezero(). This function attempts to zero up to
the target number of pages until the target has been reached (currently
target is `all free pages') or until whichqs becomes non-zero (indicating
that a process is ready to run).
- Define a new hook for the pmap module for pre-zero'ing pages. This is
used to zero the pages using uncached access. This allows us to zero
as many pages as we want without polluting the cache.

In order to use this feature, each platform must add the appropropriate
glue in their idle loop.
 1.72 16-Apr-2000  chs undo rev 1.13, which is to say, don't block interrupts while deactivating
one pmap and activating another. this isn't actually necessary (since
pmap_activate() and pmap_deactivate() affect only user-level mappings,
which cannot be accessed from interrupts anyway), and pmap_activate()
is very slow on old sun4c sparcs so we can't block interrupts for this long.
this fixes PR 8322.
 1.71 10-Apr-2000  chs sparc -> __sparc__
print lock status in uvm_object_printit().
 1.70 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.69 12-Sep-1999  chs branches: 1.69.2;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.
 1.68 21-Aug-1999  thorpej When handling the MADV_FREE case, if the amap or aobj has more than
one reference, go through the deactivate path; the page may actually
be in use by another process.

Fixes kern/8239.
 1.67 03-Aug-1999  thorpej Fix the error recovery in uvm_map_pageable_all().
 1.66 19-Jul-1999  thorpej Fix PR #8023 from Bernd Ernesti: when MADV_FREE'ing a region which spanned
more than one VM map entry, a typo caused amap_unadd() to attempt to
remove anons from the wrong amap. Fix that typo.
 1.65 18-Jul-1999  thorpej Rework uvm_map_protect():
- Fix some locking bugs; a couple of places would return an error condition
without unlocking the map.
- Deal with maps marked WIREFUTURE; if making an entry VM_PROT_NONE ->
anything else, and it is not already marked as wired, wire it.
 1.64 17-Jul-1999  thorpej Add a set of "lockflags", which can control the locking behavior
of some functions. Use these flags in uvm_map_pageable() to determine
if the map is locked on entry (replaces an already present boolean_t
argument `islocked'), and if the function should return with the map
still locked.
 1.63 07-Jul-1999  thorpej Fix a thinko which could cause a NULL pointer deref, in the PGO_FREE
case.
 1.62 07-Jul-1999  thorpej In the PGO_FREE case of uvm_map_clean()'s amap cleaning, skip wired
pages.

XXX This should be handled better in the future, probably by marking the
XXX page as released, and making uvm_pageunwire() free the page when
XXX the wire count on a released page reaches zero.
 1.61 07-Jul-1999  thorpej Add some more meat to madvise(2):
* Implement MADV_DONTNEED: deactivate pages in the specified range,
semantics similar to Solaris's MADV_DONTNEED.
* Add MADV_FREE: free pages and swap resources associated with the
specified range, causing the range to be reloaded from backing
store (vnodes) or zero-fill (anonymous), semantics like FreeBSD's
MADV_FREE and like Digital UNIX's MADV_DONTNEED (isn't it SO GREAT
that madvise(2) isn't standardized!?)

As part of this, move the non-map-modifying advice handling out of
uvm_map_advise(), and into sys_madvise().

As another part, implement general amap cleaning in uvm_map_clean(), and
change uvm_map_clean() to only push dirty pages to disk if PGO_CLEANIT
is set in its flags (and update sys___msync13() accordingly). XXX Add
a patchable global "amap_clean_works", defaulting to 1, which can disable
the amap cleaning code, just in case problems are unearthed; this gives
a developer/user a quick way to recover and send a bug report (e.g. boot
into DDB and change the value).

XXX Still need to implement a real uao_flush().

XXX Need to update the manual page.

With these changes, rebuilding libc will automatically cause the new
malloc(3) to use MADV_FREE to actually release pages and swap resources
when it decides that can be done.
 1.60 01-Jul-1999  thorpej Fix a corner case locking error, which could lead to map corruption in
SMP environments. See comments in <vm/vm_map.h> for details.
 1.59 18-Jun-1999  thorpej Add the guts of mlockall(MCL_FUTURE). This requires that a process's
"memlock" resource limit to uvm_mmap(). Update all calls accordingly.
 1.58 17-Jun-1999  thorpej The i386 and pc532 pmaps are officially fixed.
 1.57 16-Jun-1999  thorpej * Rename uvm_fault_unwire() to uvm_fault_unwire_locked(), and require that
the map be at least read-locked to call this function. This requirement
will be taken advantage of in a future commit.
* Write a uvm_fault_unwire() wrapper which read-locks the map and calls
uvm_fault_unwire_locked().
* Update the comments describing the locking contraints of uvm_fault_wire()
and uvm_fault_unwire().
 1.56 16-Jun-1999  thorpej Modify uvm_map_pageable() and uvm_map_pageable_all() to follow POSIX 1003.1b
semantics. That is, regardless of the number of mlock/mlockall calls,
an munlock/munlockall actually unlocks the region (i.e. sets wiring count
to 0).

Add a comment describing why uvm_map_pageable() should not be used for
transient page wirings (e.g. for physio) -- note, it's currently only
(ab)used in this way by a few pieces of code which are known to be
broken, i.e. the Amiga and Atari pmaps, and i386 and pc532 if PMAP_NEW is
not used. The i386 GDT code uses uvm_map_pageable(), but in a safe
way, and could be trivially converted to use uvm_fault_wire() instead.
 1.55 16-Jun-1999  thorpej Add a macro to test if a map entry is wired.
 1.54 15-Jun-1999  thorpej Several changes, developed and tested concurrently:
* Provide POSIX 1003.1b mlockall(2) and munlockall(2) system calls.
MCL_CURRENT is presently implemented. MCL_FUTURE is not fully
implemented. Also, the same one-unlock-for-every-lock caveat
currently applies here as it does to mlock(2). This will be
addressed in a future commit.
* Provide the mincore(2) system call, with the same semantics as
Solaris.
* Clean up the error recovery in uvm_map_pageable().
* Fix a bug where a process would hang if attempting to mlock a
zero-fill region where none of the pages in that region are resident.
[ This fix has been submitted for inclusion in 1.4.1 ]
 1.53 07-Jun-1999  thorpej Print the maps flags in "show map" from DDB.
 1.52 02-Jun-1999  thorpej Simplify the last even more; We downgraded to a shared (read) lock, so
setting recursive has no effect! The kernel lock manager doesn't allow
an exclusive recursion into a shared lock. This situation must simply
be avoided. The only place where this might be a problem is the (ab)use
of uvm_map_pageable() in the Utah-derived pmaps for m68k (they should
either toss the iffy scheme they use completely, or use something like
uvm_fault_wire()).

In addition, once we have looped over uvm_fault_wire(), only upgrade to
an exclusive (write) lock if we need to modify the map again (i.e.
wiring a page failed).
 1.51 02-Jun-1999  thorpej Clean up the locking mess in uvm_map_pageable() a little... Most importantly,
don't unlock a kernel map (!!!) and then relock it later; a recursive lock,
as it used in the user map case, is fine. Also, don't change map entries
while only holding a read lock on the map. Instead, if we fail to wire
a page, clear recursive locking, and upgrade back to a write lock before
dropping the wiring count on the remaining map entries.
 1.50 31-May-1999  mrg unlock the map for unknown arguments to uvm_map_advise. from Soren S. Jorvang in PR kern/7681
 1.49 28-May-1999  thorpej A little spring cleaning in the unwire case of uvm_map_pageable().
 1.48 28-May-1999  thorpej Make uvm_fault_unwire() take a vm_map_t, rather than a pmap_t, for
consistency. Use this opportunity for checking for intrsafe map use
in this routine (which is illegal).
 1.47 28-May-1999  thorpej Make "intrsafe" maps locked only by exclusive spin locks, never sleep
locks (and thus, never shared locks). Move the "set/clear recursive"
functions to uvm_map.c, which is the only placed they're used (and
they should go away anyhow). Delete some unused cruft.
 1.46 26-May-1999  thorpej Upon further investigation, in uvm_map_pageable(), entry->protection is the
right access_type to pass to uvm_fault_wire(). This way, if the entry has
VM_PROT_WRITE, and the entry is marked COW, the copy will happen immediately
in uvm_fault(), as if the access were performed.
 1.45 26-May-1999  thorpej Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.
 1.44 26-May-1999  thorpej In uvm_map_pageable(), pass VM_PROT_NONE as access type to uvm_fault_wire()
for now. XXX This needs to be reexamined.
 1.43 25-May-1999  thorpej Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.
 1.42 25-May-1999  thorpej Macro'ize the test for "object is a kernel object".
 1.41 23-May-1999  mrg implement madvice() for MADV_{NORMAL,RANDOM,SEQUENTIAL}, others are not yet done.
 1.40 20-May-1999  thorpej Make a slight modification of pmap_growkernel() -- it now returns the
end of the mappable kernel virtual address space. Previously, it would
get called more often than necessary, because the caller only new what
was requested.

Also, export uvm_maxkaddr so that uvm_pageboot_alloc() can grow the
kernel pmap if necessary, as well. Note that pmap_growkernel() must
now be able to handle being called before pmap_init().
 1.39 12-May-1999  thorpej Add an optional pmap hook, pmap_fork(), to be called at the end of
uvmspace_fork().

pmap_fork() is used to "fork a pmap", that is copy data from one pmap
to the other that is NOT related to actual mappings in the pmap, but is
otherwise logically coupled to the address space.
 1.38 03-May-1999  mrg remove now-wrong comments. formatting nits.
 1.37 19-Apr-1999  chs in uvm_map_extract(), handle the case where the map entry being extracted
is large enough to cause the end address of the new entry to overflow.
 1.36 28-Mar-1999  mycroft branches: 1.36.2;
Only turn off VM_PROT_WRITE for COW pages; not VM_PROT_EXECUTE.
 1.35 25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.34 24-Jan-1999  chuck cleanup/reorg:
- break anon related functions out of uvm_amap.c and put them in their own
file (uvm_anon.c). includes break up uvm_anon_init into an amap and an
an anon init function
- ensure that only functions within the amap module access amap structure
fields (add macros to amap api as needed)
 1.33 15-Nov-1998  chuck remove bogus permission check in uvm_map_clean(). fixes mmap/msync
problem discussed/reported by jonathan and Andreas Wrede <andreas@planix.com>.
 1.32 24-Oct-1998  mrg branches: 1.32.2;
KNF a missing bit. remove register.
 1.31 19-Oct-1998  tron Defopt SYSVMSG, SYSVSEM and SYSVSHM.
 1.30 18-Oct-1998  chs shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.
 1.29 11-Oct-1998  chuck remove unused share map code from UVM:
- replace map checks with submap checks
- get rid of unused 'mainonly' arg in uvm_unmap/uvm_unmap_remove, simplify
code. update all calls to reflect this.
- don't worry about unmapping or changing the protection of shared share
map mappings (is_main_map no longer used).
- remove unused uvm_map_sharemapcopy() function from fork code.
 1.28 31-Aug-1998  thorpej Back out previous; I should have instrumented the benefit of this one
first.
 1.27 31-Aug-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for vm_map's.
 1.26 31-Aug-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for dynamically
allocated vm_map_entry's.
 1.25 31-Aug-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for vmspace
structures.
 1.24 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.23 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.22 08-Jul-1998  thorpej branches: 1.22.2;
Add support for multiple memory free lists. There is at least one
default free list, and 0 - N additional free list, in order of descending
priority.

A new page allocation function, uvm_pagealloc_strat(), has been added,
providing three page allocation strategies:

- normal: high -> low priority free list walk, taking the
page off the first free list that has one.

- only: attempt to allocate a page only from the specified free
list, failing if that free list has none available.

- fallback: if `only' fails, fall back on `normal'.

uvm_pagealloc(...) is provided for normal use (and is a synonym for
uvm_pagealloc_strat(..., UVM_PGA_STRAT_NORMAL, 0); the free list argument
is ignored for the `normal' case).

uvm_page_physload() now specified which free list the pages will be
loaded onto. This means that some platforms which have multiple physical
memory segments may define additional vm_physsegs if they wish to break
individual physical segments into differing priorities.

Machine-dependent code must define _at least_ the following constants
in <machine/vmparam.h>:

VM_NFREELIST: the number of free lists the system will have

VM_FREELIST_DEFAULT: the default freelist (should always be 0,
but is defined in machdep code so that it's with all of the
other free list-related constants).

Additional free list names may be defined by machine-dependent code, but
they will only be used by machine-dependent code (e.g. for loading the
vm_physsegs).
 1.21 04-Jul-1998  jonathan defopt DDB.
 1.20 22-May-1998  chuck fix bug in uvm_map_extract, remove case. make sure we update the loop
variable before removing the entry from the map.
[bug was not causing problems because the remove case isn't currently
being used ...]
 1.19 09-May-1998  kleink Minor KNF.
 1.18 05-May-1998  kleink Remove inclusions of syscall (and syscall argument) related header files;
we don't need them here.
 1.17 25-Apr-1998  matthias port-pc532 now has pmap_new just like port-i386.
 1.16 30-Mar-1998  chuck have ddb show map print resident page count
 1.15 27-Mar-1998  thorpej Split uvmspace_alloc() into uvmspace_alloc() and uvmspace_init(). The latter
can be used for initializing a pre-allocated vmspace.
 1.14 19-Mar-1998  chuck rework the copy inheritance case of fork. the old way did not handle
the very rare case of shared mappings that have amap's attached in a
reasonable way -- this is not currently causing any problems, but i
fixed it anyway. update the comment in this section of code and also
be smarter about avoiding needless calls to pmap_protect().
 1.13 19-Mar-1998  thorpej Make the previous change `atomic'.
 1.12 19-Mar-1998  thorpej When unsharing or execing, deactivate the old vmspace before reassigning
and activating the new one. Pointed out by Chris Demetriou.
 1.11 17-Mar-1998  mrg oops, missed a bit of KNF here.
 1.10 09-Mar-1998  mrg KNF.
 1.9 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.8 24-Feb-1998  chuck be consistent about offsets in kernel objects. vm_map_min(kernel_map)
should always be the base [fixes problem on m68k detected by jason thorpe]

add comments to uvm_km.c explaining kernel memory management in more detail
 1.7 18-Feb-1998  drochner fix map range boundary check
 1.6 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.5 08-Feb-1998  mrg move pdhist initialisation to the same place as maphist. also, declare
the history buffers are "struct uvm_history_ent" to ensure proper
alignment (eg, alpha). this fixes a boottime panic when the pdhist was
used before it had been initialised.
 1.4 07-Feb-1998  mrg bzero the entire vmspace, like the old vm does. makes ps report sane values of VSZ for swapper/pagedaemon
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.22.2.2 12-Aug-1998  eeh Fix a debug printf if paddr_t == long long.
 1.22.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.32.2.3 30-May-1999  chs vm_page's blkno field is gone.
 1.32.2.2 25-Feb-1999  chs in uvm_page_printit(), print page flags symbolicly too.
 1.32.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.36.2.5 14-Jan-2002  he Pull up revision 1.76 (via patch, requested by chs):
Let uvm_map_extract() set the lower bound on the kernel address
range itself, instead of having its callers do that. Fixes
PR#11972.
 1.36.2.4 30-Apr-2000  he Pull up revision 1.72 (requested by chs):
Undo revision 1.13: don't block interrupts while deactivating
one pmap and activating another, since these only affect user-
level mappings which cannot be accessed from interrupt context.
This fixes Sparc zstty overflows reported in PR#8322, since pmap
operations are slow on old sun4c sparcs.
 1.36.2.3 18-Jun-1999  perry patch from thorpej: fixes bug in mlock() of anonymous memory
 1.36.2.2 18-Jun-1999  perry pullup 1.39->1.40 (thorpej): fix the 1G RAM bug
 1.36.2.1 19-Apr-1999  perry branches: 1.36.2.1.2; 1.36.2.1.4;
pullup 1.36->1.37 as requested by chuq -- fixes botch mmapping large regions.
 1.36.2.1.4.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.36.2.1.2.5 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.36.2.1.2.4 02-Aug-1999  thorpej Update from trunk.
 1.36.2.1.2.3 01-Jul-1999  thorpej Sync w/ -current.
 1.36.2.1.2.2 21-Jun-1999  thorpej Sync w/ -current.
 1.36.2.1.2.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.69.2.6 27-Mar-2001  bouyer Sync with HEAD.
 1.69.2.5 11-Feb-2001  bouyer Sync with HEAD.
 1.69.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.69.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.69.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.69.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.74.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.77.2.3 08-Oct-2001  he Pull up revision 1.104 (via patch, requested by chuq):
Create a new pool for map entries, allocated from kmem_map instead
of kernel_map. Use this instead of the static map entries when
allocating map entries for kernel_map. This greatly reduces the
number of static map entries used, and should eliminate the
problems with running out.
 1.77.2.2 16-Oct-2000  tv Pullup 1.82 [thorpej]:
- Change SAVE_HINT() to take a "check" value. This value is compared
to the contents of the hint in the map, and the hint saved in the
map only if the two values match. When an unconditional save is
required, the "check" value passed should be map->hint (and the
compiler will optimize the test away). When deleting a map entry,
the new SAVE_HINT() will only change the hint if the entry being
deleted was the hint value (thus preserving any meaningful hint
that may have been there previously, rather than stomping on it).
- Add a missing hint update when deleting the map entry in
uvm_map_entry_unlink(). This is the fix for kern/11125, from
ITOH Yasufumi <itohy@netbsd.org>.
 1.77.2.1 06-Aug-2000  fvdl Pull up version 1.80 (VM_INHERIT -> MAP_INHERIT).
 1.93.2.18 11-Dec-2002  thorpej Sync with HEAD.
 1.93.2.17 11-Dec-2002  thorpej Sync with HEAD.
 1.93.2.16 11-Nov-2002  nathanw Catch up to -current
 1.93.2.15 18-Oct-2002  nathanw Catch up to -current.
 1.93.2.14 17-Sep-2002  nathanw Catch up to -current.
 1.93.2.13 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.93.2.12 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.93.2.11 17-Apr-2002  nathanw Catch up to -current.
 1.93.2.10 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.93.2.9 08-Jan-2002  nathanw Catch up to -current.
 1.93.2.8 15-Nov-2001  pk The sparc `kill_user_windows()' special case now takes a `struct lwp *'.
 1.93.2.7 14-Nov-2001  nathanw Catch up to -current.
 1.93.2.6 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.93.2.5 21-Sep-2001  nathanw Catch up to -current.
 1.93.2.4 24-Aug-2001  nathanw Catch up with -current.
 1.93.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.93.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.93.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.99.2.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.99.2.5 16-Mar-2002  jdolecek Catch up with -current.
 1.99.2.4 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.99.2.3 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.99.2.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.99.2.1 03-Aug-2001  lukem update to -current
 1.103.2.2 01-Oct-2001  fvdl Catch up with -current.
 1.103.2.1 07-Sep-2001  fvdl file uvm_map.c was added on branch thorpej-devvp on 2001-10-01 12:48:42 +0000
 1.108.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.118.8.1 02-Jun-2003  tron Pull up revision 1.119 (requested by skrll):
add a new km flag UVM_KMF_CANFAIL, which causes uvm_km_kmemalloc() to
return failure if swap is full and there are no free physical pages.
have malloc() use this flag if M_CANFAIL is passed to it.
use M_CANFAIL to allow amap_extend() to fail when memory is scarce.
this should prevent most of the remaining hangs in low-memory situations.
 1.118.2.1 12-Mar-2002  thorpej Make kentry_lock a spin mutex at IPL_VM, and rename it to kentry_mutex.
 1.136.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.136.2.9 01-Apr-2005  skrll Sync with HEAD.
 1.136.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.136.2.7 15-Feb-2005  skrll Sync with HEAD.
 1.136.2.6 24-Jan-2005  skrll Sync with HEAD.
 1.136.2.5 17-Jan-2005  skrll Sync with HEAD.
 1.136.2.4 19-Oct-2004  skrll Sync with HEAD
 1.136.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.136.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.136.2.1 03-Aug-2004  skrll Sync with HEAD
 1.164.2.3 09-May-2004  jdc branches: 1.164.2.3.2;
Pull up revision 1.170 (requested by petrov in ticket #270)

Revert default uvm counters, rename UVMMAP_COUNTERS to UVMMAP_NOCOUNTERS.
 1.164.2.2 09-May-2004  jdc Pull up revision 1.169 (requested by petrov in ticket #269)

Replace uvm counters with evcnt, initialize them through __link_set (from Matt Thomas),
disable counters by default and add configuration option UVMMAP_COUNTERS.
 1.164.2.1 31-Mar-2004  tron Pull up revision 1.165 (requested by yamt in ticket #23):
uvm_map_findspace: don't return unaligned address if alignment is specified.
discussed on tech-kern@.
 1.164.2.3.2.2 11-May-2005  riz Pull up revision 1.188 (requested by dbj in ticket #1409):
use voff_t instead of vaddr_t to hold file offset passed to pgo_put
 1.164.2.3.2.1 06-Apr-2005  he Pull up revision 1.176 (via patch, requested by yamt in ticket #1061):
Don't merge incompatible map entries, e.g. private and shared.
 1.181.2.1 29-Apr-2005  kent sync with -current
 1.183.2.3 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.183.2.2 12-Feb-2005  yamt sync with head.
 1.183.2.1 25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.186.2.2 01-May-2005  tron Pull up revision 1.189 (requested by oster in ticket #223):
uvm_map: don't leak a preallocated map entry on error.
 1.186.2.1 08-Apr-2005  tron Pull up revision 1.188 (requested by dbj in ticket #123):
use voff_t instead of vaddr_t to hold file offset passed to pgo_put
 1.204.2.9 17-Mar-2008  yamt sync with head.
 1.204.2.8 27-Feb-2008  yamt sync with head.
 1.204.2.7 21-Jan-2008  yamt sync with head
 1.204.2.6 07-Dec-2007  yamt sync with head
 1.204.2.5 27-Oct-2007  yamt sync with head.
 1.204.2.4 03-Sep-2007  yamt sync with head.
 1.204.2.3 26-Feb-2007  yamt sync with head.
 1.204.2.2 30-Dec-2006  yamt sync with head.
 1.204.2.1 21-Jun-2006  yamt sync with head.
 1.206.2.5 01-Mar-2006  yamt sync with head.
 1.206.2.4 18-Feb-2006  yamt sync with head.
 1.206.2.3 01-Feb-2006  yamt sync with head.
 1.206.2.2 15-Jan-2006  yamt sync with head.
 1.206.2.1 31-Dec-2005  yamt - add a function to add a reference to a vmspace.
- add a macro to check if a vmspace belongs to kernel.
 1.210.4.2 01-Jun-2006  kardel Sync with head.
 1.210.4.1 22-Apr-2006  simonb Sync with head.
 1.210.2.1 09-Sep-2006  rpaulo sync with head
 1.215.4.2 11-May-2006  elad sync with head
 1.215.4.1 19-Apr-2006  elad oops - *really* sync to head this time.
 1.215.2.5 15-Sep-2006  yamt make UVM_KICK_PDAEMON() a real function and stop including
uvm_pdpolicy.h from uvm.h. this also fixes build of pmap(1).
 1.215.2.4 26-Jun-2006  yamt sync with head.
 1.215.2.3 24-May-2006  yamt sync with head.
 1.215.2.2 01-Apr-2006  yamt sync with head.
 1.215.2.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.216.2.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.224.2.1 19-Jun-2006  chap Sync with head.
 1.226.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.227.4.1 18-Nov-2006  ad Sync with head.
 1.229.2.2 10-Dec-2006  yamt sync with head.
 1.229.2.1 22-Oct-2006  yamt sync with head
 1.232.6.1 29-Oct-2007  wrstuden Catch up with 4.0 RC3
 1.232.4.3 24-Mar-2007  yamt sync with head.
 1.232.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.232.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.232.2.1 12-Oct-2007  riz Pull up following revision(s) (requested by skrll in ticket #928):
sys/uvm/uvm_map.c: revision 1.242
Don't restrict the offset when allocating a map entry for in-kernel map -
use UVM_UNKNOWN_OFFSET in the call to uvm_map_prepare.
This fixes a '"panic: malloc: out of space in kmem_map" when it's not
really' testcase of mine, and one reported to me by chuq. This is likely
to fix PR/35587 as well.
Looks/seems fine to me from chuq and yamt. Thanks.
 1.235.2.10 10-Nov-2007  yamt uvm_map_prepare: ignore UVM_FLAG_TRYLOCK for intrsafe maps.
otherwise pool_cache_get(NOWAIT) often fails even if there are plenty of
free memory and causes eg. "no pv entries available" panic on x86.
 1.235.2.9 23-Oct-2007  ad Sync with head.
 1.235.2.8 01-Sep-2007  ad Use pool_cache for allocating a few more types of objects.
 1.235.2.7 20-Aug-2007  ad Sync with HEAD.
 1.235.2.6 29-Jul-2007  ad Destroy the map's locks before it's freed.
 1.235.2.5 13-Apr-2007  ad - Fix a (new) bug where vget tries to acquire freed vnodes' interlocks.
- Minor locking fixes.
 1.235.2.4 05-Apr-2007  ad - Put a per-LWP lock around swapin / swapout.
- Replace use of lockmgr().
- Minor locking fixes and assertions.
- uvm_map.h no longer pulls in proc.h, etc.
- Use kpause where appropriate.
 1.235.2.3 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.235.2.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.235.2.1 13-Mar-2007  ad Sync with head.
 1.236.2.1 11-Jul-2007  mjf Sync with head.
 1.237.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.237.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.238.6.2 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.238.6.1 21-Jul-2007  ad file uvm_map.c was added on branch matt-mips64 on 2007-07-21 19:21:55 +0000
 1.238.4.3 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.238.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.238.4.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.240.4.2 18-Oct-2007  yamt sync with head.
 1.240.4.1 14-Oct-2007  yamt sync with head.
 1.240.2.3 23-Mar-2008  matt sync with HEAD
 1.240.2.2 09-Jan-2008  matt sync with HEAD
 1.240.2.1 06-Nov-2007  matt sync with HEAD
 1.243.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.243.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.243.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.246.6.4 19-Jan-2008  bouyer Sync with HEAD
 1.246.6.3 08-Jan-2008  bouyer Sync with HEAD
 1.246.6.2 02-Jan-2008  bouyer Sync with HEAD
 1.246.6.1 13-Dec-2007  bouyer Sync with HEAD
 1.246.4.3 13-Dec-2007  yamt sync with head.
 1.246.4.2 10-Dec-2007  yamt uvm_map_checkprot: a kludge for mem(4).
 1.246.4.1 10-Dec-2007  yamt - separate kernel va allocation (kernel_va_arena) from
in-kernel fault handling (kernel_map).
- add vmem bootstrap code. vmem doesn't rely on malloc anymore.
- make kmem_alloc interrupt-safe.
- kill kmem_map. make malloc a wrapper of kmem_alloc.
 1.246.2.6 02-Jan-2008  ad Fix merge error.
 1.246.2.5 28-Dec-2007  ad - Move remaining map locking functions into uvm_map.c. They depend on proc.h.
- Lock vm_map_kernel::vmk_merged_entries with the map's own lock. There was
a race where a thread legitimately expects to find cached entries, but can
find none because they have not been freed yet.
 1.246.2.4 27-Dec-2007  ad Be more paranoid with vm_map_lock/vm_map_unbusy.
 1.246.2.3 26-Dec-2007  ad Sync with head.
 1.246.2.2 21-Dec-2007  ad Kill vm_map::hint_lock.
 1.246.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.250.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.250.6.4 28-Sep-2008  mjf Sync with HEAD.
 1.250.6.3 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.250.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.250.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.250.2.1 24-Mar-2008  keiichi sync with head.
 1.252.2.3 17-Jun-2008  yamt sync with head.
 1.252.2.2 04-Jun-2008  yamt sync with head
 1.252.2.1 18-May-2008  yamt sync with head.
 1.254.4.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.254.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.254.2.7 09-Oct-2010  yamt sync with head
 1.254.2.6 11-Aug-2010  yamt sync with head.
 1.254.2.5 11-Mar-2010  yamt sync with head
 1.254.2.4 16-Sep-2009  yamt sync with head
 1.254.2.3 19-Aug-2009  yamt sync with head.
 1.254.2.2 20-Jun-2009  yamt sync with head
 1.254.2.1 04-May-2009  yamt sync with head.
 1.260.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.260.4.1 19-Oct-2008  haad Sync with HEAD.
 1.260.2.2 31-Jul-2008  simonb Sync with head.
 1.260.2.1 18-Jul-2008  simonb Sync with head.
 1.263.4.5 22-Aug-2012  bouyer Pull up following revision(s) (requested by chs in ticket #1790):
sys/uvm/uvm_map.c: revision 1.319
avoid leaking a uvm_object reference when merging a new map entry
with the entries on both sides. fixes PR 46807.
 1.263.4.4 21-Nov-2010  riz Pull up following revision(s) (requested by rmind in ticket #1421):
sys/uvm/uvm_bio.c: revision 1.70
sys/uvm/uvm_map.c: revision 1.292
sys/uvm/uvm_pager.c: revision 1.98
sys/uvm/uvm_fault.c: revision 1.175
sys/uvm/uvm_bio.c: revision 1.69
ubc_fault: split-off code part handling a single page into ubc_fault_page().
Keep the lock around pmap_update() where required. While fixing this
in ubc_fault(), rework logic to &quot;remember&quot; the last object of page and
reduce locking overhead, since in common case pages belong to one and
the same UVM object (but not always, therefore add a comment).
Unlocks before pmap_update(), on removal of mappings, might cause TLB
coherency issues, since on architectures like x86 and mips64 invalidation
IPIs are deferred to pmap_update(). Hence, VA space might be globally
visible before IPIs are sent or while they are still in-flight.
OK ad@.
 1.263.4.3 19-Apr-2009  snj branches: 1.263.4.3.4;
Pull up following revision(s) (requested by mrg in ticket #708):
sys/uvm/uvm_km.c: revision 1.102
sys/uvm/uvm_km.h: revision 1.18
sys/uvm/uvm_map.c: revision 1.264
PR port-amd64/32816 amd64 can not load lkms
Change some assertions to partially allow for VM_MAP_IS_KERNEL(map) where
map is outside the range of kernel_map.
 1.263.4.2 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #354):
sys/uvm/uvm_fault.c: revision 1.126
sys/uvm/uvm_map.c: revision 1.268
Move a couple of calls to pmap_update().
 1.263.4.1 27-Dec-2008  snj Pull up following revision(s) (requested by bouyer in ticket #211):
sys/uvm/uvm_km.c: revision 1.103
sys/uvm/uvm_map.c: revision 1.265
sys/uvm/uvm_page.c: revision 1.141
It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:
- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap
 1.263.4.3.4.9 07-May-2012  matt Use uvm_km_pagefree to free a kmap entry page.
 1.263.4.3.4.8 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.263.4.3.4.7 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.263.4.3.4.6 03-Jun-2011  matt Restore $NetBSD$
 1.263.4.3.4.5 03-Jun-2011  matt Rework page free lists to be sorted by color first rather than free_list.
Kept per color PGFL_* counter in each page free list.
Minor cleanups.
 1.263.4.3.4.4 25-May-2011  matt Make uvm_map recognize UVM_FLAG_COLORMATCH which tells uvm_map that the
'align' argument specifies the starting color of the KVA range to be returned.

When calling uvm_km_alloc with UVM_KMF_VAONLY, also specify the starting
color of the kva range returned (UMV_KMF_COLORMATCH) and pass those to
uvm_map.

In uvm_pglistalloc, make sure the pages being returned have sequentially
advancing colors (so they can be mapped in a contiguous address range).
Add a few missing UVM_FLAG_COLORMATCH flags to uvm_pagealloc calls.

Make the socket and pipe loan color-safe.

Make the mips pmap enforce strict page color (color(VA) == color(PA)).
 1.263.4.3.4.3 19-Aug-2010  matt Use __HAVE_CPU_VMSPACE_EXEC instead of a mips-specific #ifdef.
 1.263.4.3.4.2 18-Aug-2010  matt Add a hook so that MD code has handle the change in address space limits
when an exec happens.
Add a routine to turn on/off UX when an address space changes due to an exec
(N32 execing a N64 for instance).
 1.263.4.3.4.1 23-Aug-2009  matt PRIxVADDR, PRIdVSIZE, PRIxVSIZE, or PRIxPADDR as appropriate.
Use __intXX_t or __uintXX_t as appropriate in <mips/types.h>
 1.263.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.269.2.2 23-Jul-2009  jym Sync with HEAD.
 1.269.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.286.2.3 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.286.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.286.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.290.2.7 31-May-2011  rmind sync with head
 1.290.2.6 21-Apr-2011  rmind sync with head
 1.290.2.5 05-Mar-2011  rmind sync with head
 1.290.2.4 03-Jul-2010  rmind sync with head
 1.290.2.3 30-May-2010  rmind sync with head
 1.290.2.2 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.290.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.294.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.294.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.297.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.305.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.305.2.4 16-Jan-2013  yamt sync with (a bit old) head
 1.305.2.3 30-Oct-2012  yamt sync with head
 1.305.2.2 17-Apr-2012  yamt sync with head
 1.305.2.1 26-Dec-2011  yamt - use O->A loan to serve read(2). based on a patch from Chuck Silvers
- associated O->A loan fixes.
 1.306.2.4 29-Apr-2012  mrg sync to latest -current.
 1.306.2.3 05-Apr-2012  mrg sync to latest -current.
 1.306.2.2 24-Feb-2012  mrg sync to -current.
 1.306.2.1 18-Feb-2012  mrg merge to -current.
 1.313.2.4 07-Sep-2012  riz Pull up following revision(s) (requested by para in ticket #547):
sys/uvm/uvm_map.c: revision 1.320
sys/uvm/uvm_map.c: revision 1.321
sys/uvm/uvm_map.c: revision 1.322
sys/uvm/uvm_km.c: revision 1.130
sys/uvm/uvm_km.c: revision 1.131
sys/uvm/uvm_km.c: revision 1.132
sys/uvm/uvm_km.c: revision 1.133
sys/uvm/uvm_km.c: revision 1.134
sys/uvm/uvm_km.c: revision 1.135
sys/uvm/uvm_km.c: revision 1.129
Fix a bug where the kernel was never grown to accomodate the kmem VA space
since that happens before the kernel_map is set.
Don't try grow the entire kmem space but just do as needed in
uvm_km_kmem_alloc
Shut up gcc printf warning.
Cleanup comment. Change panic to KASSERTMSG.
Use kernel_map->misc_lock to make sure we don't call pmap_growkernel
concurrently and possibly mess up uvm_maxkaddr.
Switch to a spin lock (uvm_kentry_lock) which, fortunately, was
sitting there
unused.
Remove locking since it isn't needed. As soon as the 2nd
uvm_map_entry in kernel_map
is created, uvm_map_prepare will call pmap_growkernel and the
pmap_growkernel call in
uvm_km_mem_alloc will never be called again.
call pmap_growkernel once after the kmem_arena is created
to make the pmap cover it's address space
assert on the growth in uvm_km_kmem_alloc
for the 3rd uvm_map_entry uvm_map_prepare will grow the kernel,
but we might call into uvm_km_kmem_alloc through imports to
the kmem_meta_arena earlier
while here guard uvm_km_va_starved_p from kmem_arena not yet created
thanks for tracking this down to everyone involved
 1.313.2.3 18-Aug-2012  riz branches: 1.313.2.3.2;
Pull up following revision(s) (requested by chs in ticket #508):
sys/uvm/uvm_map.c: revision 1.319
avoid leaking a uvm_object reference when merging a new map entry
with the entries on both sides. fixes PR 46807.
 1.313.2.2 12-Apr-2012  riz Pull up following revision(s) (requested by martin in ticket #175):
sys/kern/kern_exit.c: revision 1.238
tests/lib/libc/gen/posix_spawn/t_fileactions.c: revision 1.4
tests/lib/libc/gen/posix_spawn/t_fileactions.c: revision 1.5
sys/uvm/uvm_extern.h: revision 1.183
lib/libc/gen/posix_spawn_fileactions.c: revision 1.2
sys/kern/kern_exec.c: revision 1.348
sys/kern/kern_exec.c: revision 1.349
sys/compat/netbsd32/syscalls.master: revision 1.95
sys/uvm/uvm_glue.c: revision 1.159
sys/uvm/uvm_map.c: revision 1.317
sys/compat/netbsd32/netbsd32.h: revision 1.95
sys/kern/exec_elf.c: revision 1.38
sys/sys/spawn.h: revision 1.2
sys/sys/exec.h: revision 1.135
sys/compat/netbsd32/netbsd32_execve.c: revision 1.34
Rework posix_spawn locking and memory management:
- always provide a vmspace for the new proc, initially borrowing from proc0
(this part fixes PR 46286)
- increase parallelism between parent and child if arguments allow this,
avoiding a potential deadlock on exec_lock
- add a new flag for userland to request old (lockstepped) behaviour for
better error reporting
- adapt test cases to the previous two and add a new variant to test the
diagnostics flag
- fix a few memory (and lock) leaks
- provide netbsd32 compat
Fix asynchronous posix_spawn child exit status (and test for it).
 1.313.2.1 22-Feb-2012  riz Pull up following revision(s) (requested by bouyer in ticket #29):
sys/arch/xen/x86/x86_xpmap.c: revision 1.39
sys/arch/xen/include/hypervisor.h: revision 1.37
sys/arch/xen/include/intr.h: revision 1.34
sys/arch/xen/x86/xen_ipi.c: revision 1.10
sys/arch/x86/x86/cpu.c: revision 1.97
sys/arch/x86/include/cpu.h: revision 1.48
sys/uvm/uvm_map.c: revision 1.315
sys/arch/x86/x86/pmap.c: revision 1.165
sys/arch/xen/x86/cpu.c: revision 1.81
sys/arch/x86/x86/pmap.c: revision 1.167
sys/arch/xen/x86/cpu.c: revision 1.82
sys/arch/x86/x86/pmap.c: revision 1.168
sys/arch/xen/x86/xen_pmap.c: revision 1.17
sys/uvm/uvm_km.c: revision 1.122
sys/uvm/uvm_kmguard.c: revision 1.10
sys/arch/x86/include/pmap.h: revision 1.50
Apply patch proposed in PR port-xen/45975 (this does not solve the exact
problem reported here but is part of the solution):
xen_kpm_sync() is not working as expected,
leading to races between CPUs.
1 the check (xpq_cpu != &x86_curcpu) is always false because we
have different x86_curcpu symbols with different addresses in the kernel.
Fortunably, all addresses dissaemble to the same code.
Because of this we always use the code intended for bootstrap, which doesn't
use cross-calls or lock.
2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs,
which cause it to sleep and pmap.c doesn't like that. It triggers this
KASSERT() in pmap_unmap_ptes():
KASSERT(pmap->pm_ncsw == curlwp->l_ncsw);
3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
needs to know on which CPU a pmap is loaded *now*:
pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
to a new pmap, leaving a window where a pmap is still in a CPU's
ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
by the hypervisor at any time, it can be large enough to let another
CPU free the PTP and reuse it as a normal page.
To fix 2), avoid cross-calls and IPIs completely, and instead
use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
It's safe because we just need to update the table page, a tlbflush IPI will
happen later. As a side effect, we don't need a different code for bootstrap,
fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
to fix 3), introduce a pm_xen_ptp_cpus which is updated from
cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
While there I removed the unused pmap_is_active() function;
and added some more details to DIAGNOSTIC panics.
When using uvm_km_pgremove_intrsafe() make sure mappings are removed
before returning the pages to the free pool. Otherwise, under Xen,
a page which still has a writable mapping could be allocated for
a PDP by another CPU and the hypervisor would refuse it (this is
PR port-xen/45975).
For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(),
and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries
(as suggested by Chuck Silvers on tech-kern@, see also
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and
followups).
Avoid early use of xen_kpm_sync(); locks are not available at this time.
Don't call cpu_init() twice.
Makes LOCKDEBUG kernels boot again
Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
 1.313.2.3.2.1 01-Nov-2012  matt sync with netbsd-6-0-RELEASE.
 1.322.2.4 03-Dec-2017  jdolecek update from HEAD
 1.322.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.322.2.2 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.322.2.1 12-Sep-2012  tls Initial snapshot of work to eliminate 64K MAXPHYS. Basically works for
physio (I/O to raw devices); needs more doing to get it going with the
filesystems, but it shouldn't damage data.

All work's been done on amd64 so far. Not hard to add support to other
ports. If others want to pitch in, one very helpful thing would be to
sort out when and how IDE disks can do 128K or larger transfers, and
adjust the various PCI IDE (or at least ahcisata) drivers and wd.c
accordingly -- it would make testing much easier. Another very helpful
thing would be to implement a smart minphys() for RAIDframe along the
lines detailed in the MAXPHYS-NOTES file.
 1.324.2.1 18-May-2014  rmind sync with head
 1.328.2.1 10-Aug-2014  tls Rebase.
 1.330.2.1 23-Jan-2015  martin Pull up following revision(s) (requested by chs in ticket #447):
sys/uvm/uvm_map.c: revision 1.332
skip busy anon pages in uvm_map_clean().
we shouldn't be messing with pages that someone else has busy,
and uvm_map_clean() is just advisory for amap mappings.
 1.331.2.8 28-Aug-2017  skrll Sync with HEAD
 1.331.2.7 05-Dec-2016  skrll Sync with HEAD
 1.331.2.6 05-Oct-2016  skrll Sync with HEAD
 1.331.2.5 09-Jul-2016  skrll Sync with HEAD
 1.331.2.4 29-May-2016  skrll Sync with HEAD
 1.331.2.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.331.2.2 22-Sep-2015  skrll Sync with HEAD
 1.331.2.1 06-Apr-2015  skrll Sync with HEAD
 1.340.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.340.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.342.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.343.4.1 11-May-2017  pgoyette Sync with HEAD
 1.351.2.6 04-Aug-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1880):

sys/uvm/uvm_map.c: revision 1.403 (patch)

mmap(2): Avoid arithmetic overflow in search for free space.

PR kern/56900
 1.351.2.5 01-Apr-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1816):

sys/uvm/uvm_map.c: revision 1.396

uvm(9): Fix mmap optimization for topdown case.

PR kern/51393
 1.351.2.4 01-Apr-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1815):

sys/uvm/uvm_map.c: revision 1.395

uvm(9): Fix 19-year-old bug in assertion about mmap hint.

Previously this would _first_ remember the original hint, and _then_
clamp the hint to the VM map's range:

orig_hint = hint;
if (hint < vm_map_min(map)) { /* check ranges ... */
if (flags & UVM_FLAG_FIXED) {
UVMHIST_LOG(maphist,"<- VA below map range",0,0,0,0);
return (NULL);
}
hint = vm_map_min(map);
...
KASSERTMSG(!topdown || hint <= orig_hint, "hint: %#jx, orig_hint: %#jx",
(uintmax_t)hint, (uintmax_t)orig_hint);

Even if nothing else happens in the ellipsis, taking the branch
guarantees the assertion will fail in the topdown case.
 1.351.2.3 04-Aug-2019  martin Pull up following revision(s) (requested by maxv in ticket #1320):

sys/uvm/uvm_map.c: revision 1.361

Fix info leak: 'map_attrib' is not used in UVM, and contains uninitialized
heap garbage. Return zero. Maybe we should remove the field completely.
 1.351.2.2 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.351.2.1 01-Oct-2017  martin Pull up following revision(s) (requested by pgoyette in ticket #294):
sys/uvm/uvm_map.c: revision 1.352
Fix user-triggerable kernel crash as reported in PR kern/52573 (from
Bruno Haible).
 1.354.4.3 21-Apr-2020  martin Sync with HEAD
 1.354.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.354.4.1 10-Jun-2019  christos Sync with HEAD
 1.354.2.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.354.2.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.354.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.362.2.7 09-May-2025  martin Pull up following revision(s) (requested by riastradh in ticket #1947):

sys/uvm/uvm_extern.h: revision 1.234 (via patch)
sys/kern/kern_exec.c: revision 1.528 (via patch)
sys/uvm/uvm_map.c: revision 1.427 (via patch)

posix_spawn(2): Allocate a new vmspace at process creation time.

This allocates a new vmspace for the process at the time the new
process is created, rather than sharing some other vmspace temporarily.

This eliminates any risk of anything bad happening due to temporary
sharing, since there isn't any sharing.

Resolves a race to where:
1. we set up the child to share proc0.p_vmspace at first,
2. another process tries to read the new child's psstrings via
kern.proc_args.<childpid>.argv or similar with the child's
p_reflock held and gets stuck in a uvm fault loop because
proc0.p_vmspace doesn't have the child's psstrings address
(inherited from the parent) mapped,
3. the child is waiting for p_reflock before it can replace its
p_vmspace or psstrings.

By allocating the vmspace up front, with no mappings in it, we avoid
exposing the child in this scenario. Minor possible downside is that
sysctl kern.proc_args.<childpid>.argv might spuriously fail with
EFAULT during this time (rather than fail with EBUSY as it does if
p_reflock is held concurrently) but that's not a particularly big
deal.

Patch and first paragraph of commit message written by chs@; minor
tweaks to comments -- and any mistakes in the analysis -- by me.

PR kern/59037: deadlock in posix_spawn
PR kern/59175: posix_spawn hang, hanging other process too
 1.362.2.6 24-Aug-2024  martin Pull up following revision(s) (requested by riastradh in ticket #804):

sys/uvm/uvm_map.c: revision 1.423 (patch)
sys/uvm/uvm_map.c: revision 1.425 (patch)

uvm_map(9): Make sure search in the nearest gap is monotonic.

The algorithm, on a hint clamped to the VM bounds, works as follows
(assuming topdown VM):

1. Make sure the hint is aligned, by subtracting the remainderin
uvm_map_align_va.
2. If the hint is equal to the VM max, try the first free gap.
3. If the hint is not equal to the VM max, but is already in use, try
the next gap _below_ the entry covering hint.
4. If the hint is not equal to the VM max and is not already in use,
try gap between the entry below hint and the next entry after it,
above hint.

In the last case, `entry' is the one below hint, and `entry->next' is
the one above it. We would take
entry->next->start - length
as the next candidate hint.

However, this algorithm is supposed to be a monotonic search through
the address space, and we might wind up with something like:

[0x7defb000,0x7defc000) entry above hint (entry->next)
0x77895000 hint
[0x77894000,0x77895000) entry below hint (entry)

In this case, if length=0x1000, we would take
0x7defb000 - 0x1000 = 0x7defa000
as the next candidate hint, but this violates monotonicity of the
search.

Instead, take the _smallest_ of orig_hint or entry->next->start -
length, to avoid violating monotonicity, so hint <= orig_hint.

I didn't commit this change before because it didn't seem to fix all
the manifestations of the problem, but we have more diagnostics now
so maybe we will find there is a _different_ violation of the same
invariants once this is committed -- and I'm pretty sure this change
is necessary to guarantee monotonicity in some cases (but I'm still
not sure why we're only hitting the problem on sh3).

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
uvm_map(9): Apply the same orig_hint clamp again to the same entry.

Previous change dealt with case like length=0x1000 and:

[0x7defb000,0x7defc000) entry above orig_hint (entry->next)
0x77895000 orig_hint
[0x77894000,0x77895000) entry below orig_hint (entry)

by changing
entry->next->start - length
to
MIN(orig_hint, entry->next->start - length)

in order to enforce monotonicity of search.

In this case, if the tree search for a gap has failed, we retry with
a list search with exactly the same orig_hint and entry -- nothing
has changed them (only hint and tmp). So apply the same clamping.

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.362.2.5 01-Apr-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1625):

sys/uvm/uvm_map.c: revision 1.403

mmap(2): Avoid arithmetic overflow in search for free space.

PR kern/56900
 1.362.2.4 01-Apr-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1623):

sys/uvm/uvm_map.c: revision 1.396

uvm(9): Fix mmap optimization for topdown case.

PR kern/51393
 1.362.2.3 01-Apr-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1622):

sys/uvm/uvm_map.c: revision 1.395

uvm(9): Fix 19-year-old bug in assertion about mmap hint.

Previously this would _first_ remember the original hint, and _then_
clamp the hint to the VM map's range:

orig_hint = hint;
if (hint < vm_map_min(map)) { /* check ranges ... */
if (flags & UVM_FLAG_FIXED) {
UVMHIST_LOG(maphist,"<- VA below map range",0,0,0,0);
return (NULL);
}
hint = vm_map_min(map);
...
KASSERTMSG(!topdown || hint <= orig_hint, "hint: %#jx, orig_hint: %#jx",
(uintmax_t)hint, (uintmax_t)orig_hint);

Even if nothing else happens in the ellipsis, taking the branch
guarantees the assertion will fail in the topdown case.
 1.362.2.2 01-Nov-2019  martin Addionally pull up the following revision for ticket #388:

sys/uvm/uvm_map.c 1.366

Fix previous; semantics of align argument of uvm_map() is different
when UVM_FLAG_COLORMATCH is specified.

Should fix PR kern/54669.
 1.362.2.1 01-Nov-2019  martin Pull up following revision(s) (requested by rin in ticket #388):

sys/uvm/uvm_map.c: revision 1.365

PR kern/54395

- Align hint for virtual address at the beginning of uvm_map() if
required. Otherwise, it will be rounded up/down in an unexpected
way by uvm_map_space_avail(), which results in assertion failure.
Fix kernel panic when executing earm binary (8KB pages) on aarch64
(4KB pages), which relies on mmap(2) with MAP_ALIGNED flag.
- Use inline functions/macros consistently.
- Add some more KASSERT's.

For more details, see the PR as well as discussion on port-kern:
http://mail-index.netbsd.org/tech-kern/2019/10/27/msg025629.html
 1.370.2.2 29-Feb-2020  ad Sync with head.
 1.370.2.1 17-Jan-2020  ad Sync with head.
 1.377.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.385.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.386.2.1 17-Apr-2021  thorpej Sync with HEAD.
 1.388.2.1 01-Aug-2021  thorpej Sync with HEAD.
 1.403.2.4 09-May-2025  martin Pull up following revision(s) (requested by riastradh in ticket #1109):

sys/uvm/uvm_extern.h: revision 1.234
sys/kern/kern_exec.c: revision 1.528
sys/uvm/uvm_map.c: revision 1.427

posix_spawn(2): Allocate a new vmspace at process creation time.

This allocates a new vmspace for the process at the time the new
process is created, rather than sharing some other vmspace temporarily.

This eliminates any risk of anything bad happening due to temporary
sharing, since there isn't any sharing.

Resolves a race to where:
1. we set up the child to share proc0.p_vmspace at first,
2. another process tries to read the new child's psstrings via
kern.proc_args.<childpid>.argv or similar with the child's
p_reflock held and gets stuck in a uvm fault loop because
proc0.p_vmspace doesn't have the child's psstrings address
(inherited from the parent) mapped,
3. the child is waiting for p_reflock before it can replace its
p_vmspace or psstrings.

By allocating the vmspace up front, with no mappings in it, we avoid
exposing the child in this scenario. Minor possible downside is that
sysctl kern.proc_args.<childpid>.argv might spuriously fail with
EFAULT during this time (rather than fail with EBUSY as it does if
p_reflock is held concurrently) but that's not a particularly big
deal.

Patch and first paragraph of commit message written by chs@; minor
tweaks to comments -- and any mistakes in the analysis -- by me.

PR kern/59037: deadlock in posix_spawn
PR kern/59175: posix_spawn hang, hanging other process too
 1.403.2.3 24-Aug-2024  martin Pull up following revision(s) (requested by riastradh in ticket #804):

sys/uvm/uvm_map.c: revision 1.423
sys/uvm/uvm_map.c: revision 1.425

uvm_map(9): Make sure search in the nearest gap is monotonic.

The algorithm, on a hint clamped to the VM bounds, works as follows
(assuming topdown VM):

1. Make sure the hint is aligned, by subtracting the remainderin
uvm_map_align_va.
2. If the hint is equal to the VM max, try the first free gap.
3. If the hint is not equal to the VM max, but is already in use, try
the next gap _below_ the entry covering hint.
4. If the hint is not equal to the VM max and is not already in use,
try gap between the entry below hint and the next entry after it,
above hint.

In the last case, `entry' is the one below hint, and `entry->next' is
the one above it. We would take
entry->next->start - length
as the next candidate hint.

However, this algorithm is supposed to be a monotonic search through
the address space, and we might wind up with something like:

[0x7defb000,0x7defc000) entry above hint (entry->next)
0x77895000 hint
[0x77894000,0x77895000) entry below hint (entry)

In this case, if length=0x1000, we would take
0x7defb000 - 0x1000 = 0x7defa000
as the next candidate hint, but this violates monotonicity of the
search.

Instead, take the _smallest_ of orig_hint or entry->next->start -
length, to avoid violating monotonicity, so hint <= orig_hint.

I didn't commit this change before because it didn't seem to fix all
the manifestations of the problem, but we have more diagnostics now
so maybe we will find there is a _different_ violation of the same
invariants once this is committed -- and I'm pretty sure this change
is necessary to guarantee monotonicity in some cases (but I'm still
not sure why we're only hitting the problem on sh3).

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
uvm_map(9): Apply the same orig_hint clamp again to the same entry.

Previous change dealt with case like length=0x1000 and:

[0x7defb000,0x7defc000) entry above orig_hint (entry->next)
0x77895000 orig_hint
[0x77894000,0x77895000) entry below orig_hint (entry)

by changing
entry->next->start - length
to
MIN(orig_hint, entry->next->start - length)

in order to enforce monotonicity of search.

In this case, if the tree search for a gap has failed, we retry with
a list search with exactly the same orig_hint and entry -- nothing
has changed them (only hint and tmp). So apply the same clamping.

PR kern/51254: uvm assertion "!topdown || hint <= orig_hint" failed
 1.403.2.2 22-Aug-2024  martin Pull up following revision(s) (requested by rin in ticket #783):

sys/uvm/uvm_map.c: revision 1.407
sys/uvm/uvm_map.c: revision 1.412
sys/uvm/uvm_map.c: revision 1.413

uvm_findspace(): For sh3, convert a KASSERTMSG(9) into printf(9)
XXX

Work around for PR kern/51254 until it gets fixed.

With this change, landisk survives full ATF with DIAGNOSTIC enabled.
uvm_map.c: Fix kassertmsg/printf newline mismatch in PR 51254 note.
uvm_findspace_invariants: don't repeat the message three times

The topdown and bottomup messages were exactly the same and sh3 printf
hack added the third copy. Restructure the code so that there's only
one message and make the message more obvious - the topdown condition
in the assertions was confusing b/c it's inverted (!topdown || ...
means it's the topdown map).

PR 51254
 1.403.2.1 15-May-2023  martin Pull up following revision(s) (requested by chs in ticket #167):

sys/uvm/uvm_map.c: revision 1.406

uvm: avoid a deadlock in uvm_map_clean()

The locking order between map locks and page "busy" locks
is that the page "busy" lock comes first, but uvm_map_clean()
breaks this rule by holding a map locked (as reader) while
waiting for page "busy" locks.

If another thread is in the page-fault path holding a page
"busy" lock while waiting for the map lock (as a reader)
and at the same time a third thread is blocked waiting for
the map lock as a writer (which blocks the page-fault thread),
then these three threads will all deadlock with each other.

Fix this by marking the map "busy" (to block any modifications)
and unlocking the map lock before possibly waiting for any
page "busy" locks.

Martin Pieuchot reported that the same problem existed in OpenBSD
he applied this fix there after several people tested it.

fixes PR 56952
 1.411.2.1 02-Aug-2025  perseant Sync with HEAD
 1.80 26-May-2020  kamil Catch up with the usage of struct vmspace::vm_refcnt

Use the dedicated reference counting routines.

Change the type of struct vmspace::vm_refcnt and struct vm_map::ref_count
to volatile.

Remove the unnecessary vm->vm_map.misc_lock locking in process_domem().

Reviewed by <ad>
 1.79 14-Mar-2020  ad - uvmspace_exec(), uvmspace_free(): if pmap_remove_all() returns true the
pmap is emptied. Pass UVM_FLAG_VAONLY when clearing out the map and avoid
needless extra work to tear down each mapping individually.

- uvm_map_lookup_entry(): remove the code to do a linear scan of map entries
for small maps, in preference to using the RB tree. It's questionable,
and I think the code is almost never triggered because the average number
of map entries has probably exceeded the hard-coded threshold for quite
some time.

- vm_map_entry: get it aligned on a cacheline boundary, and cluster fields
used during rbtree lookup at the beginning.
 1.78 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.77 12-Jan-2020  ad - uvm_unmap_remove(): need to call pmap_update() with the object still
locked, otherwise the page could gain a new identity and still be visible
via a stale mapping.

- Adjust reference counts with atomics.
 1.76 05-Jan-2020  para branches: 1.76.2;
remove unused predicate function

likely unused since kmem changes
 1.75 01-Aug-2019  riastradh Remove last trace of never-used map_attrib.
 1.74 18-May-2017  christos branches: 1.74.10;
more snprintb bits
 1.73 25-May-2016  christos branches: 1.73.8;
Introduce security.pax.mprotect.ptrace sysctl which can be used to bypass
mprotect settings so that debuggers can write to the text segment of traced
processes so that they can insert breakpoints. Turned off by default.
Ok: chuq (for now)
 1.72 29-Oct-2012  para branches: 1.72.14;
get rid of not used uvm_map flag (UVM_MAP_KMAPENT)
 1.71 19-Feb-2012  rmind branches: 1.71.2;
Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".
 1.70 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.69 21-Jan-2012  chs fix UVM_MAP_CLIP_* to only clip if the clip address is within the entry
(which would only not be true if the clip address is at one of the boundaries
of the entry). fixes PR 44788.
 1.68 20-Dec-2011  reinoud Ooops forgot the uvm_map.h
 1.67 12-Jun-2011  rmind branches: 1.67.2; 1.67.6;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.66 02-Feb-2011  chuck branches: 1.66.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.
 1.65 25-Sep-2010  matt branches: 1.65.2; 1.65.4;
Rename rb.h to rbtree.h, as it is more appropriate (c.f. ptree.h). Also
helps find code that hasn't been updated to use the new rbtree API.
 1.64 01-Aug-2009  yamt branches: 1.64.2; 1.64.4;
- uvm_map_extract: update map->size correctly for !UVM_EXTRACT_CONTIG.
- uvm_map_extract: panic on zero-sized entries.
- make uvm_map_replace static.
 1.63 10-Jun-2009  yamt on MADV_WILLNEED, start prefetching backing object's pages.
 1.62 29-Jul-2008  matt branches: 1.62.8;
Make uvm_map.? use <sys/rb.h> instead of <sys/tree.h>. Change the
ambiguous members ownspace/space to gap/maxgap. Add some evcnt for
evaluation of lookups using tree/list. Drop threshold of using
tree for lookups from > 30 to > 15.

Bump kernel version to 4.99.71
 1.61 26-Apr-2008  yamt branches: 1.61.2; 1.61.4; 1.61.6; 1.61.8;
fix a locking botch. PR/38415 from Wolfgang Solfrank.
 1.60 08-Jan-2008  yamt branches: 1.60.6; 1.60.8;
simplify locking and remove vm_map_upgrade/downgrade.
this fixes a deadlock due to read-lock recursion of map->lock.
 1.59 02-Jan-2008  ad Merge vmlocking2 to head.
 1.58 22-Jul-2007  he branches: 1.58.6; 1.58.12; 1.58.14; 1.58.16; 1.58.18; 1.58.22;
When _KERNEL is defined, we have now grown a dependency on
<sys/proc.h>, since one of the inline functions now refer to curlwp.
Fix this by including <sys/proc.h> when _KERNEL is defined.
 1.57 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.56 22-Feb-2007  thorpej branches: 1.56.4; 1.56.12;
TRUE -> true, FALSE -> false
 1.55 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.54 25-May-2006  yamt branches: 1.54.12;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html
 1.53 03-May-2006  yamt branches: 1.53.2;
uvm_km_suballoc: consider kva overhead of "kmapent".
fixes PR/31275 (me) and PR/32287 (Christian Biere).
 1.52 16-Feb-2006  perry branches: 1.52.2; 1.52.4; 1.52.6;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.51 11-Feb-2006  yamt remove the following options. no objections on tech-kern@.

UVM_PAGER_INLINE
UVM_AMAP_INLINE
UVM_PAGE_INLINE
UVM_MAP_INLINE
 1.50 21-Jan-2006  yamt branches: 1.50.2; 1.50.4;
implement compat_linux mremap.
 1.49 24-Dec-2005  perry branches: 1.49.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.48 11-Dec-2005  christos merge ktrace-lwp.
 1.47 17-May-2005  yamt branches: 1.47.2;
(try to) merge map entries in fault handler.
 1.46 01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.45 11-Feb-2005  chs use vm_map_{min,max}() instead of dereferencing the vm_map pointer directly.
define and use vm_map_set{min,max}() for modifying these values.
remove the {min,max}_offset aliases for these vm_map fields to be more
namespace-friendly. PR 26475.
 1.44 13-Jan-2005  yamt branches: 1.44.2; 1.44.4;
in uvm_unmap_remove, always wakeup va waiters if any.
uvm_km_free_wakeup is now a synonym of uvm_km_free.
 1.43 12-Jan-2005  yamt don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.
 1.42 01-Jan-2005  yamt in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to
large chunks for kernel_map and kmem_map to ease kva fragmentation.
 1.41 01-Jan-2005  yamt introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.
 1.40 01-Jan-2005  yamt for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.
 1.39 10-Feb-2004  matt Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.
 1.38 29-Jan-2004  yamt - split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.
 1.37 01-Nov-2003  yamt track map entries and free spaces using red-black tree
to improve scalability of operations on the map.

originally done by Niels Provos for OpenBSD.
tweaked for NetBSD by me with some advices from enami tsugutomo.
discussed on tech-kern@ and tech-perform@.
 1.36 01-Oct-2003  enami ansi'fy.
 1.35 10-Sep-2003  enami Swap where the vm map's max and min offset are stored so that they can be
used during map traversal.
 1.34 20-Feb-2003  atatat branches: 1.34.2;
Introduce "top down" memory management for mmap()ed allocations. This
means that the dynamic linker gets mapped in at the top of available
user virtual memory (typically just below the stack), shared libraries
get mapped downwards from that point, and calls to mmap() that don't
specify a preferred address will get mapped in below those.

This means that the heap and the mmap()ed allocations will grow
towards each other, allowing one or the other to grow larger than
before. Previously, the heap was limited to MAXDSIZ by the placement
of the dynamic linker (and the process's rlimits) and the space
available to mmap was hobbled by this reservation.

This is currently only enabled via an *option* for the i386 platform
(though other platforms are expected to follow). Add "options
USE_TOPDOWN_VM" to your kernel config file, rerun config, and rebuild
your kernel to take advantage of this.

Note that the pmap_prefer() interface has not yet been modified to
play nicely with this, so those platforms require a bit more work
(most notably the sparc) before they can use this new memory
arrangement.

This change also introduces a VM_DEFAULT_ADDRESS() macro that picks
the appropriate default address based on the size of the allocation or
the size of the process's text segment accordingly. Several drivers
and the SYSV SHM address assignment were changed to use this instead
of each one picking their own "default".
 1.33 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.32 22-Sep-2002  chs add a new flag VM_MAP_DYING, which is set before we start
tearing down a vm_map. use this to skip the pmap_update()
at the end of all the removes, which allows pmaps to optimize
pmap tear-down. also, use the new pmap_remove_all() hook to
let the pmap implemenation know what we're up to.
 1.31 03-Oct-2001  christos protect against traditional macro expansion.
 1.30 09-Sep-2001  chs create a new pool for map entries, allocated from kmem_map instead of
kernel_map. use this instead of the static map entries when allocating
map entries for kernel_map. this greatly reduces the number of static
map entries used and should eliminate the problems with running out.
 1.29 26-Jun-2001  thorpej branches: 1.29.2; 1.29.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.
 1.28 02-Jun-2001  chs replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.27 26-May-2001  chs replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.
 1.26 25-May-2001  chs remove trailing whitespace.
 1.25 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.24 18-Feb-2001  chs branches: 1.24.2;
clean up DIAGNOSTIC checks, use KASSERT().
 1.23 13-Dec-2000  enami Use single const char array instead of over 200 string constant.
 1.22 13-Sep-2000  thorpej Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.
 1.21 16-Aug-2000  thorpej Garbage-collect a constant that nothing uses.
 1.20 12-Aug-2000  sommerfeld Use ltsleep in a loop instead of simple_unlock/tsleep/goto try_again
 1.19 26-Jun-2000  mrg restore a dropped #ifdef _KERNEL
 1.18 26-Jun-2000  mrg <vm/vm_map.h> gets merged into <uvm/uvm_map.h>
 1.17 29-Mar-2000  simonb Remove redundant decl for uvmspace_fork() - it's in <uvm/uvm_extern.h>.
 1.16 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.15 21-Jun-1999  thorpej branches: 1.15.2;
Protect prototypes, certain macros, and inlines from userland.
 1.14 26-May-1999  thorpej Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.
 1.13 23-May-1999  mrg implement madvice() for MADV_{NORMAL,RANDOM,SEQUENTIAL}, others are not yet done.
 1.12 20-May-1999  thorpej Make a slight modification of pmap_growkernel() -- it now returns the
end of the mappable kernel virtual address space. Previously, it would
get called more often than necessary, because the caller only new what
was requested.

Also, export uvm_maxkaddr so that uvm_pageboot_alloc() can grow the
kernel pmap if necessary, as well. Note that pmap_growkernel() must
now be able to handle being called before pmap_init().
 1.11 25-Mar-1999  mrg branches: 1.11.2; 1.11.4; 1.11.6;
remove now >1 year old pre-release message.
 1.10 11-Oct-1998  chuck remove unused share map code from UVM:
- replace map checks with submap checks
- get rid of unused 'mainonly' arg in uvm_unmap/uvm_unmap_remove, simplify
code. update all calls to reflect this.
- don't worry about unmapping or changing the protection of shared share
map mappings (is_main_map no longer used).
- remove unused uvm_map_sharemapcopy() function from fork code.
 1.9 31-Aug-1998  thorpej Back out previous; I should have instrumented the benefit of this one
first.
 1.8 31-Aug-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for vm_map's.
 1.7 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.6 10-Feb-1998  mrg branches: 1.6.2;
- add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.5 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.4 07-Feb-1998  mrg restore rcsids
 1.3 07-Feb-1998  chs prototype for uvm_map_checkprot() moved to uvm_extern.h.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.6.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.11.6.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.11.4.3 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.11.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.11.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.11.2.1 18-Jun-1999  perry pullup 1.11->1.12 (thorpej): fix the 1G RAM bug
 1.15.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.15.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.15.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.15.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.24.2.9 11-Nov-2002  nathanw Catch up to -current
 1.24.2.8 18-Oct-2002  nathanw Catch up to -current.
 1.24.2.7 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.24.2.6 08-Oct-2001  nathanw Catch up to -current.
 1.24.2.5 21-Sep-2001  nathanw Catch up to -current.
 1.24.2.4 24-Aug-2001  nathanw Catch up with -current.
 1.24.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.24.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.24.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.29.4.2 11-Oct-2001  fvdl Catch up with -current. Fix some bogons in the sparc64 kbd/ms
attach code. cd18xx conversion provided by mrg.
 1.29.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.29.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.29.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.29.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.34.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.34.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.34.2.5 15-Feb-2005  skrll Sync with HEAD.
 1.34.2.4 17-Jan-2005  skrll Sync with HEAD.
 1.34.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.34.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.34.2.1 03-Aug-2004  skrll Sync with HEAD
 1.44.4.2 12-Feb-2005  yamt sync with head.
 1.44.4.1 25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.44.2.1 29-Apr-2005  kent sync with -current
 1.47.2.4 21-Jan-2008  yamt sync with head
 1.47.2.3 03-Sep-2007  yamt sync with head.
 1.47.2.2 26-Feb-2007  yamt sync with head.
 1.47.2.1 21-Jun-2006  yamt sync with head.
 1.49.2.2 18-Feb-2006  yamt sync with head.
 1.49.2.1 01-Feb-2006  yamt sync with head.
 1.50.4.2 01-Jun-2006  kardel Sync with head.
 1.50.4.1 22-Apr-2006  simonb Sync with head.
 1.50.2.1 09-Sep-2006  rpaulo sync with head
 1.52.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.52.4.1 11-May-2006  elad sync with head
 1.52.2.2 26-Jun-2006  yamt sync with head.
 1.52.2.1 24-May-2006  yamt sync with head.
 1.53.2.1 19-Jun-2006  chap Sync with head.
 1.54.12.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.56.12.1 15-Aug-2007  skrll Sync with HEAD.
 1.56.4.2 05-Apr-2007  ad - Put a per-LWP lock around swapin / swapout.
- Replace use of lockmgr().
- Minor locking fixes and assertions.
- uvm_map.h no longer pulls in proc.h, etc.
- Use kpause where appropriate.
 1.56.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.58.22.2 22-Jul-2007  he When _KERNEL is defined, we have now grown a dependency on
<sys/proc.h>, since one of the inline functions now refer to curlwp.
Fix this by including <sys/proc.h> when _KERNEL is defined.
 1.58.22.1 22-Jul-2007  he file uvm_map.h was added on branch matt-mips64 on 2007-07-22 21:07:48 +0000
 1.58.18.2 08-Jan-2008  bouyer Sync with HEAD
 1.58.18.1 02-Jan-2008  bouyer Sync with HEAD
 1.58.16.1 10-Dec-2007  yamt add a function to call pmap_growkernel if necessary. will be used by vmem.
 1.58.14.2 28-Dec-2007  ad - Move remaining map locking functions into uvm_map.c. They depend on proc.h.
- Lock vm_map_kernel::vmk_merged_entries with the map's own lock. There was
a race where a thread legitimately expects to find cached entries, but can
find none because they have not been freed yet.
 1.58.14.1 21-Dec-2007  ad Kill vm_map::hint_lock.
 1.58.12.1 18-Feb-2008  mjf Sync with HEAD.
 1.58.6.1 09-Jan-2008  matt sync with HEAD
 1.60.8.1 18-May-2008  yamt sync with head.
 1.60.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.60.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.61.8.1 19-Oct-2008  haad Sync with HEAD.
 1.61.6.1 31-Jul-2008  simonb Sync with head.
 1.61.4.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.61.2.4 09-Oct-2010  yamt sync with head
 1.61.2.3 19-Aug-2009  yamt sync with head.
 1.61.2.2 20-Jun-2009  yamt sync with head
 1.61.2.1 04-May-2009  yamt sync with head.
 1.62.8.1 23-Jul-2009  jym Sync with HEAD.
 1.64.4.2 05-Mar-2011  rmind sync with head
 1.64.4.1 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.64.2.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.65.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.65.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.66.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.67.6.2 24-Feb-2012  mrg sync to -current.
 1.67.6.1 18-Feb-2012  mrg merge to -current.
 1.67.2.2 30-Oct-2012  yamt sync with head
 1.67.2.1 17-Apr-2012  yamt sync with head
 1.71.2.2 03-Dec-2017  jdolecek update from HEAD
 1.71.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.72.14.2 28-Aug-2017  skrll Sync with HEAD
 1.72.14.1 29-May-2016  skrll Sync with HEAD
 1.73.8.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.74.10.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.76.2.2 29-Feb-2020  ad Sync with head.
 1.76.2.1 17-Jan-2020  ad Sync with head.
 1.37 11-Feb-2006  yamt remove the following options. no objections on tech-kern@.

UVM_PAGER_INLINE
UVM_AMAP_INLINE
UVM_PAGE_INLINE
UVM_MAP_INLINE
 1.36 11-Dec-2005  christos branches: 1.36.2; 1.36.4; 1.36.6;
merge ktrace-lwp.
 1.35 28-Jun-2005  thorpej branches: 1.35.2;
Clean up the cpp macro used to say "we're compiling this specific C file".
 1.34 29-May-2005  christos avoid shadow variables.
remove unneeded casts.
 1.33 01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.32 11-Feb-2005  chs use vm_map_{min,max}() instead of dereferencing the vm_map pointer directly.
define and use vm_map_set{min,max}() for modifying these values.
remove the {min,max}_offset aliases for these vm_map fields to be more
namespace-friendly. PR 26475.
 1.31 12-Jan-2005  yamt branches: 1.31.2; 1.31.4;
don't reserve (uvm_mapent_reserve) entries for malloc/pool backends
because it isn't necessary or safe.
reported and tested by Denis Lagno. PR/28897.
 1.30 01-Jan-2005  yamt introduce vm_map_kernel, a subclass of vm_map, and
move some kernel-only members of vm_map to it.
 1.29 01-Jan-2005  yamt for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.
 1.28 10-Feb-2004  matt Back out the changes in
http://mail-index.netbsd.org/source-changes/2004/01/29/0027.html
since they don't really fix the problem.

Incorpate one fix: Mark uvm_map_entry's that were created with
UVM_FLAG_NOMERGE so that they will not be used as future merge
candidates.
 1.27 29-Jan-2004  yamt - split uvm_map() into two functions for the followings.
- for in-kernel maps, disable map entry merging so that
unmap operations won't block. (workaround for PR/24039)
- for in-kernel maps, allocate kva for vm_map_entry from
the map itsself and eliminate MAX_KMAPENT and
uvm_map_entry_kmem_pool.
 1.26 01-Nov-2003  yamt track map entries and free spaces using red-black tree
to improve scalability of operations on the map.

originally done by Niels Provos for OpenBSD.
tweaked for NetBSD by me with some advices from enami tsugutomo.
discussed on tech-kern@ and tech-perform@.
 1.25 01-Oct-2003  enami ansi'fy.
 1.24 01-Dec-2002  matt branches: 1.24.6;
Reorder things so that with multiple inclusion protection that optional
definitions are outside the protection checks.
 1.23 22-Sep-2002  chs add a new flag VM_MAP_DYING, which is set before we start
tearing down a vm_map. use this to skip the pmap_update()
at the end of all the removes, which allows pmaps to optimize
pmap tear-down. also, use the new pmap_remove_all() hook to
let the pmap implemenation know what we're up to.
 1.22 26-Jun-2001  thorpej branches: 1.22.2;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.
 1.21 02-Jun-2001  chs replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.20 25-May-2001  chs remove trailing whitespace.
 1.19 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.18 27-Nov-2000  chs branches: 1.18.2;
Initial integration of the Unified Buffer Cache project.
 1.17 08-May-2000  thorpej branches: 1.17.4;
uvm_map_setup(): We almost ever set up an interrupt-safe map, but we
set up quite a few regular ones (at every fork!), so put interrupt-
safe map setup in the slow path with a __predict_false().

uvm_map_reference(): __predict_false() the check for NULL map.
uvm_map_deallocate(): Likewise.
 1.16 01-Jul-1999  thorpej branches: 1.16.2;
Fix a corner case locking error, which could lead to map corruption in
SMP environments. See comments in <vm/vm_map.h> for details.
 1.15 14-Jun-1999  thorpej Use a more descriptive wait message for VM map locks.
 1.14 04-Jun-1999  thorpej Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.
 1.13 26-May-1999  thorpej Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.
 1.12 03-May-1999  mrg remove now-wrong comment. formatting nit.
 1.11 25-Mar-1999  mrg branches: 1.11.4;
remove now >1 year old pre-release message.
 1.10 11-Oct-1998  chuck remove unused share map code from UVM:
- replace map checks with submap checks
- get rid of unused 'mainonly' arg in uvm_unmap/uvm_unmap_remove, simplify
code. update all calls to reflect this.
- don't worry about unmapping or changing the protection of shared share
map mappings (is_main_map no longer used).
- remove unused uvm_map_sharemapcopy() function from fork code.
 1.9 31-Aug-1998  thorpej Back out previous; I should have instrumented the benefit of this one
first.
 1.8 31-Aug-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for vm_map's.
 1.7 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.6 09-Mar-1998  mrg branches: 1.6.2;
KNF.
 1.5 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.6.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.11.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.11.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.16.2.3 27-Mar-2001  bouyer Sync with HEAD.
 1.16.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.16.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.17.4.2 05-Sep-2002  itojun pullup mistake - should have patched uvm_amap_i.h, not uvm_map_i.h
 1.17.4.1 04-Sep-2002  itojun pullup sys/uvm/uvm_amap_i.h 1.18 (matt)

>In amap_ref, only increment the amap's refcnt after we have established
>the ppref array. Otherwise, the newly ref'ed pages will be doubly
>counted and thus never freed because the pprefcnt can't fall to 0.
 1.18.2.5 11-Dec-2002  thorpej Sync with HEAD.
 1.18.2.4 18-Oct-2002  nathanw Catch up to -current.
 1.18.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.18.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.18.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.22.2.1 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.24.6.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.24.6.6 01-Apr-2005  skrll Sync with HEAD.
 1.24.6.5 15-Feb-2005  skrll Sync with HEAD.
 1.24.6.4 17-Jan-2005  skrll Sync with HEAD.
 1.24.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.24.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.24.6.1 03-Aug-2004  skrll Sync with HEAD
 1.31.4.2 12-Feb-2005  yamt sync with head.
 1.31.4.1 25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.31.2.1 29-Apr-2005  kent sync with -current
 1.35.2.1 21-Jun-2006  yamt sync with head.
 1.36.6.1 22-Apr-2006  simonb Sync with head.
 1.36.4.1 09-Sep-2006  rpaulo sync with head
 1.36.2.1 18-Feb-2006  yamt sync with head.
 1.80 14-Jun-2020  ad Remove PG_ZERO. It worked brilliantly on x86 machines from the mid-90s but
having spent an age experimenting with it over the last 6 months on various
machines and with different use cases it's always either break-even or a
slight net loss for me.
 1.79 11-Jun-2020  ad Counter tweaks:

- Don't need to count anonpages+filepages any more; clean+unknown+dirty for
each kind of page can be summed to get the totals.

- Track the number of free pages with a counter so that it's one less thing
for the allocator to do, which opens up further options there.

- Remove cpu_count_sync_one(). It has no users and doesn't save a whole lot.
For the cheap option, give cpu_count_sync() a boolean parameter indicating
that a cached value is okay, and rate limit the updates for cached values
to hz.
 1.78 11-Jun-2020  ad uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
 1.77 23-May-2020  ad Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.76 22-Mar-2020  ad Process concurrent page faults on individual uvm_objects / vm_amaps in
parallel, where the relevant pages are already in-core. Proposed on
tech-kern.

Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until
adjustments are made to their pmaps.
 1.75 19-Mar-2020  ad sysctl_vm_uvmexp2(): some counters were needlessly truncated.
 1.74 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.73 31-Dec-2019  ad branches: 1.73.2;
Rename uvm_free() -> uvm_availmem().
 1.72 21-Dec-2019  ad Counter tweaks:

"zeroaborts" + "free" don't need to be per-CPU counters, and "bucketmiss"
wasn't used. Remove those and cluster by usage.
 1.71 21-Dec-2019  ad uvmexp.free -> uvm_free()
 1.70 16-Dec-2019  ad - Extend the per-CPU counters matt@ did to include all of the hot counters
in UVM, excluding uvmexp.free, which needs special treatment and will be
done with a separate commit. Cuts system time for a build by 20-25% on
a 48 CPU machine w/DIAGNOSTIC.

- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
 1.69 07-Jan-2019  jdolecek add sysctl to easily set ubc_direct

PR kern/53124
 1.68 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.67 02-Dec-2017  mrg branches: 1.67.2; 1.67.4;
add two new members to uvmexp_sysctl{}: bootpages and poolpages.
bootpages is set to the pages allocated via uvm_pageboot_alloc().
poolpages is calculated from the list of pools nr_pages members.

this brings us closer to having a valid total of pages known by
the system, vs actual pages originally managed.

XXX: poolpages needs some handling for PR_RECURSIVE pools still.
 1.66 02-Jul-2017  joerg Export the guard size of the main thread via vm.guard_size. Add a
complementary writable sysctl for the initial guard size of threads
created via pthread_create. Let the existing attribut accessors do the
right thing. Raise the default guard size for threads to 64KB.
 1.65 01-Dec-2014  msaitoh branches: 1.65.10;
Sort in uvmexp_sysctl's order for readability. No functional change.
 1.64 01-Dec-2014  msaitoh Fix a bug that "vmstat -s" print uvmexp.ncolors incorrectly.
 1.63 26-Feb-2014  martin branches: 1.63.4; 1.63.6;
Fix copy & pasto
 1.62 26-Feb-2014  matt Add vm.min_address and vm.max_address which return VM_MIN_ADDRESS and
VM_MAXUSER_ADDRESS.
 1.61 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.60 02-Jun-2012  dsl branches: 1.60.2; 1.60.4;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.
 1.59 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.58 30-Dec-2011  christos prevent kernel from writing more than userland passed.
 1.57 13-Nov-2011  christos branches: 1.57.4;
if you are going to dereference a variable, check the variable itself, not
it cousin.
 1.56 02-Feb-2011  chuck branches: 1.56.4;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.
 1.55 20-Dec-2010  matt branches: 1.55.2; 1.55.4;
Move counting of faults, traps, intrs, soft[intr]s, syscalls, and nswtch
from uvmexp to per-cpu cpu_data and move them to 64bits. Remove unneeded
includes of <uvm/uvm_extern.h> and/or <uvm/uvm.h>.
 1.54 16-Nov-2010  enami Nowadays, comparing priority against PZERO doesn't make any sense.
Instead, see if a process waits uninterruptibly like ps does,
so that the second column (`b') of default vmstat output prints
some useful value (-t is still broken though).
 1.53 06-Nov-2010  uebayasi Include uvm/uvm.h because this is part of UVM.
 1.52 16-Apr-2010  rmind - Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.
 1.51 11-Apr-2010  mrg now that CTLTYPE_BOOL actually works, use it to export vm_page_zero_enable
as vm.idlezero in a way that actually works on big endian systems.
 1.50 21-Oct-2009  rmind branches: 1.50.2; 1.50.4;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.49 04-Jun-2008  ad branches: 1.49.8; 1.49.14; 1.49.16; 1.49.18;
- vm_page: put listq, pageq into a union alongside a LIST_ENTRY, so we can
use both types of list.

- Make page coloring and idle zero state per-CPU.

- Maintain per-CPU page freelists. When freeing, put pages onto the local
CPU's lists and the global lists. When allocating, prefer to take pages
from the local CPU. If none are available take from the global list as
done now. Proposed on tech-kern@.
 1.48 24-Apr-2008  ad branches: 1.48.2; 1.48.4;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.47 26-Feb-2007  yamt branches: 1.47.38; 1.47.40;
implement priority inheritance.
 1.46 17-Feb-2007  pavel Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.
 1.45 15-Feb-2007  ad branches: 1.45.2;
Fix load average calculation:

- Don't consider kernel threads when calculating the load average. Their
priorities are no longer adjusted by the scheduler, and their level of
activity is dependent upon running user processes.
- Change the (l->l_priority > PZERO) check in uvm_meter() to (l->l_flag &
L_SINTR). I think this check was originally intended to weed out
processes sleeping interruptably.
 1.44 09-Feb-2007  ad Merge newlock2 to head.
 1.43 01-Nov-2006  yamt branches: 1.43.2; 1.43.4;
remove some __unused from function parameters.
 1.42 12-Oct-2006  dogcow even more __unused.
 1.41 15-Sep-2006  yamt branches: 1.41.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.40 07-Jun-2006  kardel branches: 1.40.6;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.39 21-Dec-2005  yamt branches: 1.39.4; 1.39.6; 1.39.8; 1.39.14;
whitespace in SYSCTL_DESCR.
 1.38 21-Dec-2005  yamt make length of inactive queue tunable by sysctl. (vm.inactivepct)
 1.37 11-Dec-2005  christos merge ktrace-lwp.
 1.36 09-Nov-2005  simonb Whitespace nit.
 1.35 27-Jun-2005  thorpej branches: 1.35.2;
Use ANSI function decls.
 1.34 15-May-2005  yamt remove anon related statistics which are no longer used.
 1.33 10-Oct-2004  yamt expose vm_page_zero_enable as vm.idlezero sysctl.
XXX assuming boolean_t == int.
 1.32 25-May-2004  atatat Sysctl descriptions under vm subtree
 1.31 24-Mar-2004  atatat branches: 1.31.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.30 24-Mar-2004  junyoung - Nuke __P().
- Drop trailing spaces.
 1.29 11-Jan-2004  yamt sysctl_vm_updateminmax: fix swapped filemin and execmin.
the problem reported by Vesbula on current-users@.
 1.28 07-Dec-2003  tsutsui Allow sysctl(8) to update vm.{anon,exec,file}{min,max}.

XXX needs sysctl(9) man page to confirm this change..
 1.27 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.26 29-Jun-2003  fvdl branches: 1.26.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.25 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.24 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.23 09-Dec-2001  chs add {anon,file,exec}max as a upper bound on the amount of memory that
will be allocated for the respective usage types when there is contention
for memory.

replace "vnode" and "vtext" with "file" and "exec" in uvmexp field names
and sysctl names.
 1.22 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.21 14-Jul-2001  matt branches: 1.21.4;
Add support for kern.maxphys, vm.maxslp, vm.uspace (the later two for ps).
 1.20 02-Jun-2001  chs branches: 1.20.2;
replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.19 25-May-2001  chs remove trailing whitespace.
 1.18 29-Apr-2001  thorpej Implement page coloring, using a round-robin bucket selection
algorithm (Solaris calls this "Bin Hopping").

This implementation currently relies on MD code to define a
constant defining the number of buckets. This will change
reasonably soon (MD code will be able to dynamically size
the bucket array).
 1.17 09-Mar-2001  chs add UBC memory-usage balancing. we track the number of pages in use for
each of the basic types (anonymous data, executable image, cached files)
and prevent the pagedaemon from reusing a given page if that would reduce
the count of that type of page below a sysctl-setable minimum threshold.
the thresholds are controlled via three new sysctl tunables:
vm.anonmin, vm.vnodemin, and vm.vtextmin. these tunables are the
percentages of pageable memory reserved for each usage, and we do not allow
the sum of the minimums to be more than 95% so that there's always some
memory that can be reused.
 1.16 30-Nov-2000  simonb branches: 1.16.2;
Move uvm_pgcnt_vnode and uvm_pgcnt_anon into uvmexp (as vnodepages and
anonpages), and add vtextpages which is currently unused but will be
used to trace the number of pages used by vtext vnodes.
 1.15 29-Nov-2000  simonb Add a vm.uvmexp2 sysctl that uses a ABI-safe 'struct uvmexp_sysctl'.
 1.14 24-Nov-2000  chs use queue.h macros and other misc cleanup.
 1.13 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.12 26-May-2000  thorpej Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.
 1.11 11-Feb-2000  thorpej Add some very simple code to auto-size the kmem_map. We take the
amount of physical memory, divide it by 4, and then allow machine
dependent code to place upper and lower bounds on the size. Export
the computed value to userspace via the new "vm.nkmempages" sysctl.

NKMEMCLUSTERS is now deprecated and will generate an error if you
attempt to use it. The new option, should you choose to use it,
is called NKMEMPAGES, and two new options NKMEMPAGES_MIN and
NKMEMPAGES_MAX allow the user to configure the bounds in the kernel
config file.
 1.10 25-Jul-1999  thorpej branches: 1.10.2;
Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.
 1.9 22-Jul-1999  thorpej Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.8 25-Mar-1999  mrg branches: 1.8.4;
remove now >1 year old pre-release message.
 1.7 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.6 09-Mar-1998  mrg KNF.
 1.5 08-Feb-1998  mrg fill out vmtotals: t_free, t_vm, t_avm, t_rm and t_arm. leaves shared of same, and t_pw.
 1.4 07-Feb-1998  mrg KNF.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.8.4.1 02-Aug-1999  thorpej Update from trunk.
 1.10.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.10.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.10.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.16.2.7 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.16.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.16.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.16.2.4 24-Aug-2001  nathanw Catch up with -current.
 1.16.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.16.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.16.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.20.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.20.2.1 03-Aug-2001  lukem update to -current
 1.21.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.26.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.26.2.5 19-Oct-2004  skrll Sync with HEAD
 1.26.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.26.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.26.2.2 03-Aug-2004  skrll Sync with HEAD
 1.26.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.31.2.1 28-May-2004  tron Pull up revision 1.32 (requested by atatat in ticket #389):
Sysctl descriptions under vm subtree
 1.35.2.4 03-Sep-2007  yamt sync with head.
 1.35.2.3 26-Feb-2007  yamt sync with head.
 1.35.2.2 30-Dec-2006  yamt sync with head.
 1.35.2.1 21-Jun-2006  yamt sync with head.
 1.39.14.1 19-Jun-2006  chap Sync with head.
 1.39.8.2 26-Jun-2006  yamt sync with head.
 1.39.8.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.39.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.39.4.1 09-Sep-2006  rpaulo sync with head
 1.40.6.3 29-Dec-2006  ad Checkpoint work in progress.
 1.40.6.2 18-Nov-2006  ad Sync with head.
 1.40.6.1 11-Sep-2006  ad - Allocate and free turnstiles where needed.
- Split proclist_mutex and alllwp_mutex out of the proclist_lock,
and use in interrupt context.
- Fix an MP race in enterpgrp()/setsid().
- Acquire proclist_lock and p_crmutex in some obvious places.
 1.41.2.2 10-Dec-2006  yamt sync with head.
 1.41.2.1 22-Oct-2006  yamt sync with head
 1.43.4.1 29-Oct-2007  wrstuden Catch up with 4.0 RC3
 1.43.2.2 19-Nov-2011  bouyer Pull up following revision(s) (requested by christos in ticket #1436):
sys/uvm/uvm_meter.c: revision 1.57 via patch
if you are going to dereference a variable, check the variable itself, not
it cousin.
 1.43.2.1 12-Oct-2007  riz branches: 1.43.2.1.4;
Pull up following revision(s) (requested by ad in ticket #929):
sys/uvm/uvm_meter.c: revision 1.45 (via patch)
Fix load average calculation:
- Don't consider kernel threads when calculating the load average. Their
priorities are no longer adjusted by the scheduler, and their level of
activity is dependent upon running user processes.
- Change the (l->l_priority > PZERO) check in uvm_meter() to (l->l_flag &
L_SINTR). I think this check was originally intended to weed out
processes sleeping interruptably.
 1.43.2.1.4.1 19-Nov-2011  bouyer Pull up following revision(s) (requested by christos in ticket #1436):
sys/uvm/uvm_meter.c: revision 1.57 via patch
if you are going to dereference a variable, check the variable itself, not
it cousin.
 1.45.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.47.40.2 17-Jun-2008  yamt sync with head.
 1.47.40.1 18-May-2008  yamt sync with head.
 1.47.38.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.47.38.1 02-Jun-2008  mjf Sync with HEAD.
 1.48.4.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.48.2.3 11-Aug-2010  yamt sync with head.
 1.48.2.2 11-Mar-2010  yamt sync with head
 1.48.2.1 04-May-2009  yamt sync with head.
 1.49.18.1 18-Nov-2011  sborrill Pull up the following revisions(s) (requested by christos in ticket #1691):
sys/uvm/uvm_meter.c: revision 1.57

Dereference correct variable and thus stop a sysctl crash.
 1.49.16.4 12-Apr-2012  matt Separate object-less anon pages out of the active list if there is no swap
device. Make uvm_reclaimable and uvm.*estimatable understand colors and
kmem allocations.
 1.49.16.3 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.49.16.2 03-Jun-2011  matt Restore $NetBSD$
 1.49.16.1 03-Jun-2011  matt Rework page free lists to be sorted by color first rather than free_list.
Kept per color PGFL_* counter in each page free list.
Minor cleanups.
 1.49.14.1 18-Nov-2011  sborrill Pull up the following revisions(s) (requested by christos in ticket #1691):
sys/uvm/uvm_meter.c: revision 1.57

Dereference correct variable and thus stop a sysctl crash.
 1.49.8.1 18-Nov-2011  sborrill Pull up the following revisions(s) (requested by christos in ticket #1691):
sys/uvm/uvm_meter.c: revision 1.57

Dereference correct variable and thus stop a sysctl crash.
 1.50.4.2 05-Mar-2011  rmind sync with head
 1.50.4.1 30-May-2010  rmind sync with head
 1.50.2.3 09-Nov-2010  uebayasi Sync with HEAD.
 1.50.2.2 30-Apr-2010  uebayasi Sync with HEAD.
 1.50.2.1 28-Apr-2010  uebayasi Adjustment for uvm/uvm_page.h. More to follow later.
 1.55.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.55.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.56.4.11 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.56.4.10 30-Oct-2012  yamt sync with head
 1.56.4.9 17-Apr-2012  yamt sync with head
 1.56.4.8 05-Feb-2012  yamt turn vm.loanread sysctl to a threshold.
 1.56.4.7 11-Jan-2012  yamt create a sysctl knob to turn on/off loaned read.
 1.56.4.6 26-Dec-2011  yamt - use O->A loan to serve read(2). based on a patch from Chuck Silvers
- associated O->A loan fixes.
 1.56.4.5 20-Nov-2011  yamt - fix page loaning XXX make O->A loaning further
- add some statistics
 1.56.4.4 14-Nov-2011  yamt might dirty -> possibly dirty
suggested by wiz@
 1.56.4.3 13-Nov-2011  yamt a patch supposed to unbreak abi from christos@
PR/45598
 1.56.4.2 12-Nov-2011  yamt redo the page clean/dirty/unknown accounting separately for file and
anonymous pages
 1.56.4.1 11-Nov-2011  yamt - track the number of clean/dirty/unknown pages in the system.
- g/c PG_MARKER
 1.57.4.1 18-Feb-2012  mrg merge to -current.
 1.60.4.1 18-May-2014  rmind sync with head
 1.60.2.2 03-Dec-2017  jdolecek update from HEAD
 1.60.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.63.6.2 28-Aug-2017  skrll Sync with HEAD
 1.63.6.1 06-Apr-2015  skrll Sync with HEAD
 1.63.4.1 17-Dec-2014  martin Pull up following revision(s) (requested by msaitoh in ticket #329):
sys/uvm/uvm_meter.c: revision 1.64
sys/uvm/uvm_meter.c: revision 1.65
Fix a bug that "vmstat -s" prints uvmexp.ncolors incorrectly.
Sort in uvmexp_sysctl's order for readability. No functional change.
 1.65.10.1 31-Aug-2017  bouyer Pull up following revision(s) (requested by joerg in ticket #234):
sys/arch/amd64/include/vmparam.h: revision 1.43
sys/kern/exec_subr.c: revision 1.79
lib/libpthread/pthread_int.h: revision 1.94
sys/arch/mips/include/vmparam.h: revision 1.58
sys/arch/mips/include/vmparam.h: revision 1.59
lib/libpthread/TODO: revision 1.19
sys/arch/powerpc/include/vmparam.h: revision 1.20
sys/arch/riscv/include/vmparam.h: revision 1.2
sys/arch/riscv/include/vmparam.h: revision 1.3
sys/arch/i386/include/vmparam.h: revision 1.85
tests/lib/libpthread/t_join.c: revision 1.9
sys/uvm/uvm_meter.c: revision 1.66
sys/uvm/uvm_param.h: revision 1.36
sys/kern/exec_subr.c: revision 1.80
sys/uvm/uvm_param.h: revision 1.37
sys/kern/exec_subr.c: revision 1.81
sys/kern/exec_subr.c: revision 1.82
lib/libpthread/pthread_attr_getguardsize.3: revision 1.4
lib/libpthread/pthread.c: revision 1.148
lib/libpthread/pthread_attr.c: revision 1.17
sys/arch/amd64/include/vmparam.h: revision 1.42
Always include a 1MB guard area beyond the end of stack. While ASLR will
normally create a guard area as well, this provides a deterministic area
for all binaries.
Mitigates the rest of CVE-2017-1000374 and CVE-2017-1000375 from
Qualys.
Revert for the moment, creates problems on i386.
Recommit exec_subr.c revision 1.79:
Always include a 1MB guard area beyond the end of stack. While ASLR will
normally create a guard area as well, this provides a deterministic area
for all binaries.
Mitigates the rest of CVE-2017-1000374 and CVE-2017-1000375 from
Qualys.
Additionally, change VM_DEFAULT_ADDRESS_TOPDOWN to include
user_stack_guard_size in the size reservation.
Update VM_DEFAULT_ADDRESS32_TOPDOWN to include guard area.
Export the guard size of the main thread via vm.guard_size. Add a
complementary writable sysctl for the initial guard size of threads
created via pthread_create. Let the existing attribut accessors do the
right thing. Raise the default guard size for threads to 64KB.
 1.67.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.67.4.1 10-Jun-2019  christos Sync with HEAD
 1.67.2.2 18-Jan-2019  pgoyette Synch with HEAD
 1.67.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.73.2.1 17-Jan-2020  ad Sync with head.
 1.186 24-Feb-2025  andvar s/architecure/architecture/ and few other typos in comments.
 1.185 21-Nov-2023  riastradh branches: 1.185.2;
pax(9): Rework header file more coherently to nix some needless #ifs.

Cleans up some of the fallout from PR kern/57711 fixes.

Could do a little more to nix PAX_SEGVGUARD conditionals but maybe
not worth it.
 1.184 07-Jul-2022  rin Convert CTASSERT(9) for PAGE_{SIZE,MASK} into KASSERT(9).

They are not compile-time constants for sparc.
 1.183 06-Jul-2022  riastradh uvm(9): fo_mmap caller guarantees positive size.

No functional change intended, just sprinkling assertions to make it
clearer.
 1.182 06-Jul-2022  riastradh mmap(2): Assert size != 0 in non-anonymous case.

This is guaranteed by a test earlier; adding the assertion just makes
it clearer that it applies to the branch where we call fo_mmap -- no
functional change intended.
 1.181 06-Jul-2022  riastradh mmap(2): Avoid overflow in rounding and checking size.
 1.180 04-Jun-2022  riastradh mmap(2): If we fail with a hint, try again without it.

`Hint' here means nonzero addr, but no MAP_FIXED or MAP_TRYFIXED.

This is suboptimal -- we could teach uvm_mmap to do a fancier search
using the address as a hint. But this should do for now.

Candidate fix for PR kern/55533.
 1.179 19-Apr-2022  riastradh Revert "mmap(2): If we fail with a hint, try again without it."

This doesn't work, because uvm_mmap releases the uobj when it fails.
Should factor this more coherently, but let's just revert for now.

Reported-by: syzbot+d347c8951821b236117a@syzkaller.appspotmail.com
Reported-by: syzbot+7643d1b769fdfa18c3b2@syzkaller.appspotmail.com
Reported-by: syzbot+44f4b39671dd580cba5c@syzkaller.appspotmail.com
Reported-by: syzbot+b5a422299ca4ffe8570c@syzkaller.appspotmail.com
Reported-by: syzbot+22681822db67b6e90cfb@syzkaller.appspotmail.com
Reported-by: syzbot+e59f493ceef72b925a17@syzkaller.appspotmail.com
Reported-by: syzbot+666f3fe8364f47e8641b@syzkaller.appspotmail.com
Reported-by: syzbot+511d4572f52f1fd9b5cc@syzkaller.appspotmail.com
 1.178 19-Apr-2022  riastradh mmap(2): If we fail with a hint, try again without it.

`Hint' here means nonzero addr, but no MAP_FIXED or MAP_TRYFIXED.

This is suboptimal -- we could teach uvm_mmap to do a fancier search
using the address as a hint. But this should do for now.

Candidate fix for PR kern/55533.

ok chs@
 1.177 27-Mar-2022  hannken Make mmap() with "len == 0" an error if not MAP_ANON. We should return
an error for MAP_ANON too but unfortunately our /libexec/ld.elf_so
sometimes creates an empty anon mapping for the bss of a shared library.

At least FreeBSD and Solaris return this error too and according to POSIX
"If len is zero, mmap() shall fail and no mapping shall be established".

Fixes PR pkg/56338 Installing qt5-qtdeclarative leaves a dangling reference

The dangling reference here originates from vn_mmap() taking a vnode
reference for this empty mapping that will never be released.
 1.176 21-Jul-2021  skrll need <sys/param.h> for COHERENCY_UNIT

Minor KNF along the way.
 1.175 23-Feb-2020  ad branches: 1.175.10;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.174 04-Oct-2019  kamil branches: 1.174.2;
Avoid left shift changing the signedness flag

Reviewed by <mrg>

Reported-by: syzbot+25ac03024cedf27f3368@syzkaller.appspotmail.com
 1.173 06-Aug-2019  maxv Change 'npgs' from int to size_t. Otherwise the 64bit->32bit conversion
could lead to npgs=0, which is not expected. It later triggers a panic
in uvm_vsunlock().

Found by TriforceAFL (Akul Pillai).
 1.172 06-Apr-2019  thorpej branches: 1.172.4;
Overhaul the API used to fetch and store individual memory cells in
userspace. The old fetch(9) and store(9) APIs (fubyte(), fuword(),
subyte(), suword(), etc.) are retired and replaced with new ufetch(9)
and ustore(9) APIs that can return proper error codes, etc. and are
implemented consistently across all platforms. The interrupt-safe
variants are no longer supported (and several of the existing attempts
at fuswintr(), etc. were buggy and not actually interrupt-safe).

Also augmement the ucas(9) API, making it consistently available on
all plaforms, supporting uniprocessor and multiprocessor systems, even
those that do not have CAS or LL/SC primitives.

Welcome to NetBSD 8.99.37.
 1.171 14-Mar-2019  christos unify rounding and range checking.
 1.170 14-Mar-2019  kre Avoid a panic from the sequence

mlock(buf, 0);
munlock(buf, 0);
mlock(buf, page);
munlock(buf, page);

where buf is page aligned, and page is actually anything > 0
(but not too big) which will get rounded up to the next multiple
of the page size.

In that sequence, it is possible that the 1st munlock() is optional.

Add a KASSERT() (or two) to detect the first effects of the problem
(without that, or in !DIAGNOSTIC kernels) the problem eventually
causes some kind of problem or other (most often still a panic.)

After this, mlock(anything, 0) (or munlock) validates "anything"
but is otherwise a no-op (regardless of the alignment of anything).

Also, don't treat mlock(buf, verybig) as equivalent to mlock(buf, 0)
which is (more or less) what we had been doing.

XXX pullup -8 (maybe -7 as well, need to check).
 1.169 19-Dec-2017  kamil branches: 1.169.4;
Drop SYS_sbrk

sbrk - change data segment size

This syscall is dummy since the inception of the project.

Sponsored by <The NetBSD Foundation>
 1.168 19-Dec-2017  kamil Drop the sstk(2) syscall stub

sstk - change stack section size

This functionality has never been implemented and is a remnant from 16-bit
UNIX. This stub appeared with the first NetBSD commit.

Sponsored by <The NetBSD Foundation>
 1.167 27-Oct-2017  utkarsh009 [syzkaller] Fix for PR #52658 as suggested by riastradh@

The bug was found by Dmitry Vyukov (dvyukov@google.com)
using syzkaller and was tested by me on a VM running
8.99.5
 1.166 20-May-2017  chs branches: 1.166.2;
MAP_FIXED means something different for mremap() than it does for mmap(),
so we cannot use UVM_FLAG_FIXED to specify both behaviors.
keep UVM_FLAG_FIXED with its earlier meaning (prior to my previous change)
of whether to use uvm_map_findspace() to locate space for the new mapping or
to use the hint address that the caller passed in, and add a new flag
UVM_FLAG_UNMAP to indicate that any existing entries in the range should be
unmapped as part of creating the new mapping. the new UVM_FLAG_UNMAP flag
may only be used if UVM_FLAG_FIXED is also specified.
 1.165 19-May-2017  chs make MAP_FIXED mapping operations atomic. fixes PR 52239.
previously, unmapping any entries being replaced was done separately
from entering the new mapping, which allowed another thread doing
a non-MAP_FIXED mapping to allocate the range out from under the
MAP_FIXED thread.
 1.164 06-May-2017  joerg Extend the mmap(2) interface to allow requesting protections for later
use with mprotect(2), but without enabling them immediately.

Extend the mremap(2) interface to allow duplicating mappings, i.e.
create a second range of virtual addresses references the same physical
pages. Duplicated mappings can have different effective protections.

Adjust PAX mprotect logic to disallow effective protections of W&X, but
allow one mapping W and another X protections. This obsoletes using
temporary files for purposes like JIT.

Adjust PAX logic for mmap(2) and mprotect(2) to fail if W&X is requested
and not silently drop the X protection.

Improve test cases to ensure correct operation of the changed
interfaces.
 1.163 29-Apr-2017  christos MAP_COPY is handled in compat
 1.162 09-Aug-2016  kre branches: 1.162.6;

The only error that can occur from munlock() on NetBSD is ENOMEM.
Make it be that way.
 1.161 07-Aug-2016  maxv KNF a little.
 1.160 07-Aug-2016  maxv Explicitly return syscall-specific error codes, instead of the ones given
by range_test. This fixes msync, mlock and munlock, which all return EINVAL
instead of ENOMEM if the address is not in the va space.

It should also fix the recent ATF failures.
 1.159 01-Jun-2016  pgoyette Variable rv is always used as a true/false boolen, so set its type
correctly.

From PR kern/46369
 1.158 24-May-2016  martin PR kern/50985: use the runtime limits of the vmspace in range_test()
instead of the compile time defaults for it.
 1.157 22-May-2016  christos reduce #ifdef mess caused by PaX
 1.156 07-Apr-2016  christos remove more ifdefs
 1.155 07-Apr-2016  christos Add PAX_MPROTECT_DEBUG
 1.154 26-Nov-2015  martin We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.
 1.153 04-Aug-2015  maxv Some changes, to reduce a bit my tech-kern@ patch:
- move the P_PAX_ flags out of #ifdef PAX_ASLR in pax.h
- add a generic pax_flags_active() function
- fix a comment in exec_elf.c; interp is not static
- KNF for return
- rename pax_aslr() to pax_aslr_mmap()
- rename pax_segvguard_cb() to pax_segvguard_cleanup_cb()
 1.152 01-Mar-2015  mlelstv Detect overflow when rounding length parameter and return ENOMEM.
Fixes PR kern/49692.
 1.151 10-Jan-2015  chs in uvm_mmap_dev(), use the passed-in offset instead of 0.
from Onno van der Linden in PR 49536.
 1.150 14-Dec-2014  chs add a new "fo_mmap" fileops method to allow use of arbitrary uvm_objects for
mappings of file objects. move vnode-specific details of mmap()ing a vnode
from uvm_mmap() to the new vnode-specific vn_mmap(). add new uvm_mmap_dev()
and uvm_mmap_anon() convenience functions for mapping character devices
and anonymous memory, and replace all other calls to uvm_mmap() with those.
use the new fileop in drm2 so that libdrm can use mmap() to map things
like on other platforms (instead of the ioctl that we have used so far).
 1.149 05-Sep-2014  matt branches: 1.149.2;
Use f_vnode instead of f_data
 1.148 25-Jan-2014  christos branches: 1.148.4;
make this compile.
 1.147 25-Jan-2014  christos deal with COMPAT_10 issue.
 1.146 25-Jan-2014  christos provide proper defaults for topdown and bottomup allocation.
XXX: Ports that provide their own VM_DEFAULT_ADDRESS() need to provide the
two new flavors, otherwise they get the default ones now.
 1.145 11-Sep-2013  martin Allow MD code to add aditional checks for mmap(..., MAP_FIXED) address
ranges. This can be used, for example, to avoid not implemented VA-holes,
but we probably need to check in a few more places.
 1.144 27-Jan-2012  para branches: 1.144.6; 1.144.10;
extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.143 05-Jan-2012  reinoud Revert MAP_NOSYSCALLS patch.
 1.142 22-Dec-2011  reinoud Redo uvm_map_setattr() to never fail and remove the possible panic. The
possibility of failure was a C&P error.
 1.141 20-Dec-2011  reinoud If we need to set the PK_CHKNOSYSCALL flag in struct proc be so nice to first
take the mutex. Tnx for pointing it out to me.
 1.140 20-Dec-2011  reinoud Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..
 1.139 14-Oct-2011  hannken branches: 1.139.2; 1.139.6;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.
 1.138 12-Oct-2011  yamt fix an integer promotion bug on 64 bit ports.
(signed + unsigned = unsigned)
 1.137 23-Jun-2011  matt Allow PAX_ASLR to be used by itself.
 1.136 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.135 23-Apr-2011  rmind branches: 1.135.2;
Replace "malloc" in comments, remove unnecessary header inclusions.
 1.134 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
verified with Mike Hibler it is ok to remove clause 3 on utah copyright,
as per UCB.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.133 24-Jun-2010  hannken branches: 1.133.2; 1.133.4;
Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.132 01-Nov-2009  uebayasi branches: 1.132.2; 1.132.4;
Consistently call amap / uobj layers as upper / lower, because UVM has only
those two layers by design. Approved by Chuck Cranor some time ago.
 1.131 18-Aug-2009  yamt uvm_mmap: remove a dead conditional.
 1.130 10-Jun-2009  yamt on MADV_WILLNEED, start prefetching backing object's pages.
 1.129 30-May-2009  yamt wrap long lines.
 1.128 29-Mar-2009  mrg - add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
 1.127 14-Mar-2009  dsl ANSIfy another 1261 function definitions.
The only ones left in sys are beyond by sed script!
(or in sys/dist or sys/external)
Mostly they have function pointer parameters.
 1.126 03-Jun-2008  ad branches: 1.126.6; 1.126.8; 1.126.12;
uvm_mmap: don't lock the map unless we need to.
 1.125 02-Jun-2008  ad One more.
 1.124 02-Jun-2008  ad Don't needlessly acquire v_interlock.
 1.123 02-Jun-2008  ad Don't needlessly acquire v_interlock.
 1.122 21-Mar-2008  ad branches: 1.122.2; 1.122.4; 1.122.6;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.121 02-Jan-2008  ad branches: 1.121.6;
Merge vmlocking2 to head.
 1.120 26-Dec-2007  christos Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.
 1.119 20-Dec-2007  dsl Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
 1.118 26-Nov-2007  pooka branches: 1.118.2; 1.118.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.117 10-Oct-2007  ad branches: 1.117.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.116 08-Oct-2007  ad Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.
 1.115 23-Sep-2007  yamt branches: 1.115.2;
make RANGE_TEST a function.
 1.114 27-Jul-2007  pooka branches: 1.114.4; 1.114.6; 1.114.8;
Change unused fflags parameter in VOP_MMAP to prot and pass in
desired vm protection.
 1.113 22-Jul-2007  pooka Retire uvn_attach() - it abuses VXLOCK and its functionality,
setting vnode sizes, is handled elsewhere: file system vnode creation
or spec_open() for regular files or block special files, respectively.

Add a call to VOP_MMAP() to the pagedvn exec path, since the vnode
is being memory mapped.

reviewed by tech-kern & wrstuden
 1.112 15-May-2007  elad branches: 1.112.2;
Some Veriexec stuff that's been rotting in my tree for months.

Bug fixes:
- Fix crash reported by Scott Ellis on current-users@.

- Fix race conditions in enforcing the Veriexec rename and remove
policies. These are NOT security issues.

- Fix memory leak in rename handling when overwriting a monitored
file.

- Fix table deletion logic.

- Don't prevent query requests if not in learning mode.


KPI updates:
- fileassoc_table_run() now takes a cookie to pass to the callback.

- veriexec_table_add() was removed, it is now done internally. As a
result, there's no longer a need for VERIEXEC_TABLESIZE.

- veriexec_report() was removed, it is now internal.

- Perform sanity checks on the entry type, and enforce default type
in veriexec_file_add() rather than in veriexecctl.

- Add veriexec_flush(), used to delete all Veriexec tables, and
veriexec_dump(), used to fill an array with all Veriexec entries.


New features:
- Add a '-k' flag to veriexecctl, to keep the filenames in the kernel
database. This allows Veriexec to produce slightly more accurate
logs under certain circumstances. In the future, this can be either
replaced by vnode->pathname translation, or combined with it.

- Add a VERIEXEC_DUMP ioctl, to dump the entire Veriexec database.
This can be used to recover a database if the file was lost.
Example usage:

# veriexecctl dump > /etc/signatures

Note that only entries with the filename kept (that is, were loaded
with the '-k' flag) will be dumped.

Idea from Brett Lymn.

- Add a VERIEXEC_FLUSH ioctl, to delete all Veriexec entries. Sample
usage:

# veriexecctl flush

- Add a 'veriexec_flags' rc(8) variable, and make its default have
the '-k' flag. On systems using the default signatures file
(generaetd from running 'veriexecgen' with no arguments), this will
use additional 32kb of kernel memory on average.

- Add a '-e' flag to veriexecctl, to evaluate the fingerprint during
load. This is done automatically for files marked as 'untrusted'.


Misc. stuff:
- The code for veriexecctl was massively simplified as a result of
eliminating the need for VERIEXEC_TABLESIZE, and now uses a single
pass of the signatures file, making the loading somewhat faster.

- Lots of minor fixes found using the (still under development)
Veriexec regression testsuite.

- Some of the messages Veriexec prints were improved.

- Various documentation fixes.


All relevant man-pages were updated to reflect the above changes.

Binary compatibility with existing veriexecctl binaries is maintained.
 1.111 11-May-2007  christos Make us standards compliant again. Return EINVAL in all cases (except for
mmap) so we cannot tell what went wrong.
 1.110 11-May-2007  christos Improve on previous and write a RANGE_TEST macro and do it on all the
system calls instead of doing a half-assed job on some of them and none
on others.
 1.109 11-May-2007  christos fix bogus wrap tests; ssize_t != int...
 1.108 04-Mar-2007  christos branches: 1.108.2; 1.108.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.107 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.106 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.105 09-Feb-2007  ad branches: 1.105.2;
Merge newlock2 to head.
 1.104 03-Feb-2007  elad If Veriexec prevents indirect execution of the binary, in addition to just
blocking the mmap() if exec bit is requested, also strip exec bit from
maxprot for further mprotect() calls.

Okay joerg@.
 1.103 11-Jan-2007  elad Cosmetic nit in the 'filename' passed to veriexec_verify().
 1.102 01-Nov-2006  yamt branches: 1.102.2;
remove some __unused from function parameters.
 1.101 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.100 05-Oct-2006  chs add support for O_DIRECT (I/O directly to application memory,
bypassing any kernel caching for file data).
 1.99 30-Sep-2006  elad If Veriexec enforces access type, don't allow mmap() to use PROT_EXEC on
files that don't have the "indirect" flag. Also change the "library" alias
in veriexecctl(8) to mean "file, indirect".

okay blymn@
 1.98 21-Jul-2006  ad branches: 1.98.4; 1.98.6;
- Use the LWP cached credentials where sane.
- Minor cosmetic changes.
 1.97 20-May-2006  elad Better implementation of PaX MPROTECT, after looking some more into the
code and not trying to use temporary solutions.

Lots of comments and help from YAMAMOTO Takashi, also thanks to the PaX
author for being quick to recognize that something fishy's going on. :)

Hook up in mmap/vmcmd rather than (ugh!) uvm_map_protect().

Next time I suggest to commit a temporary solution just revoke my
commit bit.
 1.96 14-May-2006  elad branches: 1.96.2;
integrate kauth.
 1.95 05-Apr-2006  christos Coverity CID 2721: Avoid bitching for impossible cases, by adding KASSERT.
 1.94 11-Dec-2005  christos branches: 1.94.4; 1.94.6; 1.94.8; 1.94.10; 1.94.12;
merge ktrace-lwp.
 1.93 10-Oct-2005  chs stop converting async msync() to sync.
this hasn't been needed for years (if it ever was).
 1.92 23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.91 11-May-2005  yamt branches: 1.91.2;
allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.90 01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.89 26-Mar-2005  fvdl Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.
 1.88 11-Feb-2005  chs branches: 1.88.4;
use vm_map_{min,max}() instead of dereferencing the vm_map pointer directly.
define and use vm_map_set{min,max}() for modifying these values.
remove the {min,max}_offset aliases for these vm_map fields to be more
namespace-friendly. PR 26475.
 1.87 23-Jan-2005  chs branches: 1.87.2;
pmap_wired_count() is now available on all platforms,
remove the code for the case where it's not defined.
 1.86 01-Jan-2005  yamt branches: 1.86.2;
for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.
 1.85 02-Dec-2004  briggs mlock(2) and munlock(2) are defined by our man pages (which agree with
those on opengroup.org) to return ENOMEM if trying to lock a region that
is not accessible. So if uvm_map_pageable() returns EFAULT, make it ENOMEM.
 1.84 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.83 19-May-2004  darrenr rather than just try to get a mapping from a device as only PROT_EXEC, work
down the list of protections until either we run out or we find one that the
device is willing to work with.
 1.82 24-Mar-2004  junyoung Drop trailing spaces.
 1.81 14-Feb-2004  dsl Fix prev. so it compiles
 1.80 14-Feb-2004  jdolecek add compat hook in check for zerodev; use this hook to recognize
the old ARM /dev/zero minor mapping #ifdef COMPAT_16
fixes second part of PR kern/23581 by Richard Earnshaw
 1.79 29-Nov-2003  yamt mincore: don't treat an aobj as a device mapping.
 1.78 07-Oct-2003  thorpej Add a MAP_WIRED flag to mmap(2), which causes the new mapping to be
wired as if by mlock(2).
 1.77 24-Aug-2003  chs fix some indentation.
 1.76 24-Aug-2003  chs mprotect()'s "len" is really a size_t, and we can't do any useful
bounds-checking on it.
 1.75 06-Jul-2003  christos PR/22062: Dheeraj S: Don't compare an integral type with NULL.
 1.74 29-Jun-2003  fvdl branches: 1.74.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.73 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.72 23-Jun-2003  christos PR/21948: Todd Vierling: Implement MAP_TRYFIXED for linux emulation.
 1.71 04-May-2003  gmcgarry Don't use overloaded term "comm". From Greg A. Woods in PR#17394.
 1.70 06-Mar-2003  matt Add support for mmap(2) to be able to return memory aligned on a 2^n
boundary.
 1.69 23-Feb-2003  pk Make updating a file's reference and use count MP-safe.
 1.68 20-Feb-2003  atatat Introduce "top down" memory management for mmap()ed allocations. This
means that the dynamic linker gets mapped in at the top of available
user virtual memory (typically just below the stack), shared libraries
get mapped downwards from that point, and calls to mmap() that don't
specify a preferred address will get mapped in below those.

This means that the heap and the mmap()ed allocations will grow
towards each other, allowing one or the other to grow larger than
before. Previously, the heap was limited to MAXDSIZ by the placement
of the dynamic linker (and the process's rlimits) and the space
available to mmap was hobbled by this reservation.

This is currently only enabled via an *option* for the i386 platform
(though other platforms are expected to follow). Add "options
USE_TOPDOWN_VM" to your kernel config file, rerun config, and rebuild
your kernel to take advantage of this.

Note that the pmap_prefer() interface has not yet been modified to
play nicely with this, so those platforms require a bit more work
(most notably the sparc) before they can use this new memory
arrangement.

This change also introduces a VM_DEFAULT_ADDRESS() macro that picks
the appropriate default address based on the size of the allocation or
the size of the process's text segment accordingly. Several drivers
and the SYSV SHM address assignment were changed to use this instead
of each one picking their own "default".
 1.67 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.66 27-Sep-2002  mycroft #if 0 the call to uvm_map_checkprot() in sys_munmap() -- it's not documented,
and programs do not expect it. Also fixes memory leaks in dlopen()/dlclose().
 1.65 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.64 31-May-2002  atatat "offest" -> "offset" in a comment
 1.63 22-Mar-2002  darrenr branches: 1.63.2; 1.63.4;
Return EFBIG from mmap() if we try to map too much data and in the fixed
address allocation, return EOVERFLOW to match with the non-fixed error.
 1.62 14-Dec-2001  chs in sys_mincore(), check the return value of uvm_vslock() to determine
if the vec pointer is valid rather than using uvm_useracc().
uvm_useracc() just tells you if the permissions of a user mapping allow
the desired access, not whether faulting on that mapping will succeed.
 1.61 25-Nov-2001  chs disallow mapping negative offsets for both regular files and block devices.
 1.60 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.59 30-Oct-2001  thorpej uvm_map_protect(): Don't allow VM_PROT_EXECUTE to be set on entries
(either the current protection or the max protection) that reference
vnodes associated with a file system mounted with the NOEXEC option.

uvm_mmap(): Don't allow PROT_EXEC mappings to be established of vnodes
which are associated with a file system mounted with the NOEXEC option.
 1.58 30-Oct-2001  thorpej - Add a new vnode flag VEXECMAP, which indicates that a vnode has
executable mappings. Stop overloading VTEXT for this purpose (VTEXT
also has another meaning).
- Rename vn_marktext() to vn_markexec(), and use it when executable
mappings of a vnode are established.
- In places where we want to set VTEXT, set it in v_flag directly, rather
than making a function call to do this (it no longer makes sense to
use a function call, since we no longer overload VTEXT with VEXECMAP's
meaning).

VEXECMAP suggested by Chuq Silvers.
 1.57 29-Oct-2001  thorpej uvm_mmap(): If a vnode mapping is established with PROT_EXEC, mark the
vnode as VTEXT.

uvm_map_protect(): When VM_PROT_EXECUTE is added to a VA range, mark
all the vnodes mapped by the range as VTEXT.
 1.56 15-Sep-2001  chs branches: 1.56.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.55 17-Aug-2001  chs branches: 1.55.2;
call VOP_MMAP() before allowing mappings of vnodes to allow
filesystems which do not support memory mapped access to cause
mmap() of their vnodes to fail.
 1.54 14-Jun-2001  thorpej branches: 1.54.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.
 1.53 02-Jun-2001  chs replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.52 26-May-2001  chs replace vm_page_t with struct vm_page *.
 1.51 25-May-2001  chs remove trailing whitespace.
 1.50 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.49 18-Feb-2001  chs branches: 1.49.2;
clean up DIAGNOSTIC checks, use KASSERT().
 1.48 08-Jan-2001  thorpej Nevermind that it's silly to include PROT_EXEC even if a vnode
doesn't have the exec bit set, we need to have PROT_EXEC set
in order for some expected mmap/mprotect behavior to work, so
do the last bit slightly differently: if udv_attach() fails, and
the protection (NOT maxprot) doens't include PROT_EXEC, then clear
PROT_EXEC from maxprot and try udv_attach() again.

Sigh, mmap really needs to be rototilled.
 1.47 07-Jan-2001  thorpej Only include PROT_EXEC in maxprot if the user specified PROT_EXEC
in the mmap() call. maxprot is used to create device mappings,
and always including PROT_EXEC causes the mapping to fail on the Alpha
when mapping a non-RAM offset of /dev/mem (which may be sparse, so
instruction fetch from there is disallowed).
 1.46 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.45 24-Nov-2000  soren Typo in comment.
 1.44 13-Sep-2000  thorpej Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.
 1.43 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.42 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.41 23-May-2000  enami branches: 1.41.4;
- Move the comment, which describes that calling the function
uvm_map_pageable(map, ...) implies unlocking passed map, just before the
function call.
- If we bail out before calling the uvm_map_pageable, unlock the map
by ourself to prevent a panic ``locking against myself''. The panic is,
for example, caused when cdrecord is invoked with too large fifo size.
 1.40 30-Mar-2000  augustss Remove more register declarations.
 1.39 28-Mar-2000  kleink In mmap(), bail out with EOVERFLOW when mapping a regular file and the file
offset plus mapping length cannot be represented in an off_t.
 1.38 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.37 11-Dec-1999  thorpej Remove a piece of code introduced in rev 1.36 that I didn't intend to
commit.
 1.36 13-Nov-1999  thorpej Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.
 1.35 17-Jul-1999  thorpej branches: 1.35.2; 1.35.4; 1.35.8;
Add a set of "lockflags", which can control the locking behavior
of some functions. Use these flags in uvm_map_pageable() to determine
if the map is locked on entry (replaces an already present boolean_t
argument `islocked'), and if the function should return with the map
still locked.
 1.34 14-Jul-1999  thorpej Fix an operator precedence error which caused msync(2) to fail to pass
the PGO_CLEANIT flag to the object pagers. Fixes PR #7978, from
Matthias Pfaller.
 1.33 12-Jul-1999  kleink XSH5: change function signature to `void *sbrk(intptr_t)'.
 1.32 10-Jul-1999  thorpej Make a comment reflect reality.
 1.31 10-Jul-1999  thorpej Slightly better test for "object with no real pages". Test for NULL
pgo_releasepg rather than if the pager is the device pager.
 1.30 08-Jul-1999  thorpej Correct a comment.
 1.29 07-Jul-1999  thorpej Add some more meat to madvise(2):
* Implement MADV_DONTNEED: deactivate pages in the specified range,
semantics similar to Solaris's MADV_DONTNEED.
* Add MADV_FREE: free pages and swap resources associated with the
specified range, causing the range to be reloaded from backing
store (vnodes) or zero-fill (anonymous), semantics like FreeBSD's
MADV_FREE and like Digital UNIX's MADV_DONTNEED (isn't it SO GREAT
that madvise(2) isn't standardized!?)

As part of this, move the non-map-modifying advice handling out of
uvm_map_advise(), and into sys_madvise().

As another part, implement general amap cleaning in uvm_map_clean(), and
change uvm_map_clean() to only push dirty pages to disk if PGO_CLEANIT
is set in its flags (and update sys___msync13() accordingly). XXX Add
a patchable global "amap_clean_works", defaulting to 1, which can disable
the amap cleaning code, just in case problems are unearthed; this gives
a developer/user a quick way to recover and send a bug report (e.g. boot
into DDB and change the value).

XXX Still need to implement a real uao_flush().

XXX Need to update the manual page.

With these changes, rebuilding libc will automatically cause the new
malloc(3) to use MADV_FREE to actually release pages and swap resources
when it decides that can be done.
 1.28 06-Jul-1999  cgd from the comment added to the code:
> XXX (in)sanity check. We don't do proper datasize checking
> XXX for anonymous (or private writable) mmap(). However,
> XXX know that if we're trying to allocate more than the amount
> XXX remaining under our current data size limit, _that_ should
> XXX be disallowed.
This is one link on the chain of lossage known as PR#7897. It's
definitely not the right fix, but it's better than nothing.
 1.27 01-Jul-1999  thorpej Fix tyop. From Bill Studenmund.
 1.26 19-Jun-1999  thorpej Fix a typo.
 1.25 18-Jun-1999  thorpej Add the guts of mlockall(MCL_FUTURE). This requires that a process's
"memlock" resource limit to uvm_mmap(). Update all calls accordingly.
 1.24 17-Jun-1999  thorpej In sys_mmap():
- rather than treating MAP_COPY like MAP_PRIVATE by sheer virtue of it not
being MAP_SHARED, actually convert the MAP_COPY flag into MAP_PRIVATE.
- return EINVAL if MAP_SHARED and MAP_PRIVATE are both included in flags.
 1.23 16-Jun-1999  minoura Remove extra ].
 1.22 15-Jun-1999  thorpej Several changes, developed and tested concurrently:
* Provide POSIX 1003.1b mlockall(2) and munlockall(2) system calls.
MCL_CURRENT is presently implemented. MCL_FUTURE is not fully
implemented. Also, the same one-unlock-for-every-lock caveat
currently applies here as it does to mlock(2). This will be
addressed in a future commit.
* Provide the mincore(2) system call, with the same semantics as
Solaris.
* Clean up the error recovery in uvm_map_pageable().
* Fix a bug where a process would hang if attempting to mlock a
zero-fill region where none of the pages in that region are resident.
[ This fix has been submitted for inclusion in 1.4.1 ]
 1.21 23-May-1999  mrg implement madvice() for MADV_{NORMAL,RANDOM,SEQUENTIAL}, others are not yet done.
 1.20 03-May-1999  mrg fix some formatting foo.
 1.19 25-Mar-1999  mrg branches: 1.19.2; 1.19.4; 1.19.6;
remove now >1 year old pre-release message.
 1.18 24-Mar-1999  cgd modify udv_attach() and its caller (uvm_mmap()) so that it's passed the
offset and size of the requested region to be mapped, so that the
udv_attach() can use the device d_mmap() entry to check mappability
of the requested region.
 1.17 09-Mar-1999  kleink Have unimplemented/unsupported system calls (madvise(), mincore(), sbrk(),
sstk()) fail with ENOSYS.
 1.16 04-Mar-1999  chs fix printf format types.
 1.15 11-Oct-1998  chuck branches: 1.15.2;
remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks
 1.14 30-Sep-1998  mrg back out previous.
 1.13 30-Sep-1998  tv Declare silent success on madvise(). As an advisory call, it is harmless
to pretend success even though it's not supported, and some emulations
rely on its success.
 1.12 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.11 07-Jul-1998  thorpej branches: 1.11.2;
Add support for mmap'ing disk block devices.
 1.10 30-May-1998  kleink Per XSH98, const'ify the `addr' arguments to mlock() and munlock().
 1.9 10-May-1998  mrg reject attempts to map an immutable or append-only file, shared with
write protection. this stops data corruption where it was possible
to change the in-memory copy of an append-only file (but not the on-disk
copy). this is documented in NetBSD security advisory 1998-003. thanks
to darrenr, lukem, cgd, mycroft and mrg for this.
 1.8 01-Apr-1998  tv mmap() default MAP_SHARED/MAP_PRIVATE is ``DEBUG'', not ``DIAGNOSTIC''
 1.7 28-Mar-1998  kleink Per XPG, if the file descriptor argument to mmap() refers to a file whose
type is not supported (neither VREG nor VCHR, or not a vnode at all), fail
with ENODEV instead of EINVAL.
 1.6 09-Mar-1998  mrg KNF.
 1.5 03-Mar-1998  mycroft Convert MAP_PRIVATE device mappings to MAP_SHARED on *all* platforms, not just
the SPARC.
Remove the #ifdef COMPAT_13 for automatically adding a sharing type, since the
interface is *supposed* to support this.
Also modify the DIAGNOSTIC messages here a bit.
 1.4 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.11.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.15.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.19.6.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.19.4.7 11-Aug-1999  chs add casts for trunc_page() and round_page() args.
 1.19.4.6 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.19.4.5 02-Aug-1999  thorpej Update from trunk.
 1.19.4.4 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.19.4.3 01-Jul-1999  thorpej Sync w/ -current.
 1.19.4.2 21-Jun-1999  thorpej Sync w/ -current.
 1.19.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.19.2.1 07-Jul-1999  perry pullup 1.27->1.28 (cgd)
 1.35.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.35.4.1 15-Nov-1999  fvdl Sync with -current
 1.35.2.5 27-Mar-2001  bouyer Sync with HEAD.
 1.35.2.4 12-Mar-2001  bouyer Sync with HEAD.
 1.35.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.35.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.35.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.41.4.1 25-Jan-2001  jhawk Pull up revisions 1.47-1.48 via patch (requested by thorpej):
Change PROT_EXEC handling. Clear it from the maxprot if the protection
lacks it, after a failed udv_attach() and retry the udv_attach().
 1.49.2.16 18-Oct-2002  nathanw Catch up to -current.
 1.49.2.15 17-Sep-2002  nathanw Catch up to -current.
 1.49.2.14 16-Jul-2002  nathanw Whitespace.
 1.49.2.13 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.49.2.12 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.49.2.11 20-Jun-2002  nathanw Catch up to -current.
 1.49.2.10 29-May-2002  nathanw #include <sys/sa.h> before <sys/syscallargs.h>, to provide sa_upcall_t
now that <sys/param.h> doesn't include <sys/sa.h>.

(Behold the Power of Ed)
 1.49.2.9 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.49.2.8 08-Jan-2002  nathanw Catch up to -current.
 1.49.2.7 14-Nov-2001  nathanw Catch up to -current.
 1.49.2.6 21-Sep-2001  nathanw Catch up to -current.
 1.49.2.5 24-Aug-2001  nathanw A few files and lwp/proc conversions I missed in the last big update.
GENERIC runs again.
 1.49.2.4 24-Aug-2001  nathanw Catch up with -current.
 1.49.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.49.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.49.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.54.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.54.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.54.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.54.2.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.55.2.2 01-Oct-2001  fvdl Catch up with -current.
 1.55.2.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.56.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.63.4.2 15-Mar-2004  jmc Pullup rev 1.66 (requested by skrll in ticket #1607)

#if 0 the call to uvm_map_checkprot() in sys_munmap() -- it's not documented,
and programs do not expect it. Also fixes memory leaks in dlopen()/dlclose().
 1.63.4.1 17-Aug-2003  tron Pull up revision 1.72 (requested by tv in ticket #1420):
PR/21948: Todd Vierling: Implement MAP_TRYFIXED for linux emulation.
 1.63.2.2 20-Jun-2002  gehenna catch up with -current.
 1.63.2.1 16-May-2002  gehenna Get rid of iszerodev. Use the 'zerodev' (dev_t for /dev/zero).
 1.74.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.74.2.9 01-Apr-2005  skrll Sync with HEAD.
 1.74.2.8 15-Feb-2005  skrll Sync with HEAD.
 1.74.2.7 24-Jan-2005  skrll Sync with HEAD.
 1.74.2.6 17-Jan-2005  skrll Sync with HEAD.
 1.74.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.74.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.74.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.74.2.2 03-Aug-2004  skrll Sync with HEAD
 1.74.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.86.2.1 29-Apr-2005  kent sync with -current
 1.87.2.3 26-Mar-2005  yamt sync with head.
 1.87.2.2 12-Feb-2005  yamt sync with head.
 1.87.2.1 25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.88.4.3 15-Oct-2005  riz Pull up following revision(s) (requested by chs in ticket #877):
sys/uvm/uvm_mmap.c: revision 1.93
stop converting async msync() to sync.
this hasn't been needed for years (if it ever was).
 1.88.4.2 18-Sep-2005  tron Pull up following revision(s) (requested by fvdl in ticket #798):
sys/compat/sunos/sunos_exec.c: revision 1.47
sys/compat/pecoff/pecoff_emul.c: revision 1.11
sys/arch/sparc64/sparc64/netbsd32_machdep.c: revision 1.45
sys/arch/amd64/amd64/netbsd32_machdep.c: revision 1.12
sys/sys/proc.h: revision 1.198
sys/compat/mach/mach_exec.c: revision 1.56
sys/compat/freebsd/freebsd_exec.c: revision 1.27
sys/arch/sparc64/include/vmparam.h: revision 1.27
sys/kern/kern_resource.c: revision 1.91
sys/compat/netbsd32/netbsd32_netbsd.c: revision 1.88
sys/compat/osf1/osf1_exec.c: revision 1.39
sys/compat/svr4_32/svr4_32_resource.c: revision 1.5
sys/compat/ultrix/ultrix_misc.c: revision 1.99
sys/compat/svr4_32/svr4_32_exec.h: revision 1.9
sys/kern/exec_elf32.c: revision 1.103
sys/compat/aoutm68k/aoutm68k_exec.c: revision 1.19
sys/compat/sunos32/sunos32_exec.c: revision 1.20
sys/compat/hpux/hpux_exec.c: revision 1.46
sys/compat/darwin/darwin_exec.c: revision 1.40
sys/kern/sysv_shm.c: revision 1.83
sys/uvm/uvm_extern.h: revision 1.99
sys/uvm/uvm_mmap.c: revision 1.89
sys/kern/kern_exec.c: revision 1.195
sys/compat/netbsd32/netbsd32.h: revision 1.31
sys/arch/sparc64/sparc64/svr4_32_machdep.c: revision 1.20
sys/compat/svr4/svr4_exec.c: revision 1.56
sys/compat/irix/irix_exec.c: revision 1.41
sys/compat/ibcs2/ibcs2_exec.c: revision 1.63
sys/compat/svr4_32/svr4_32_exec.c: revision 1.16
sys/arch/amd64/include/vmparam.h: revision 1.8
sys/compat/linux/common/linux_exec.c: revision 1.73
Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.
* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2
Tested on amd64, compile-tested on sparc64.
 1.88.4.1 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.91.2.8 24-Mar-2008  yamt sync with head.
 1.91.2.7 21-Jan-2008  yamt sync with head
 1.91.2.6 07-Dec-2007  yamt sync with head
 1.91.2.5 27-Oct-2007  yamt sync with head.
 1.91.2.4 03-Sep-2007  yamt sync with head.
 1.91.2.3 26-Feb-2007  yamt sync with head.
 1.91.2.2 30-Dec-2006  yamt sync with head.
 1.91.2.1 21-Jun-2006  yamt sync with head.
 1.94.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.94.10.2 19-Apr-2006  elad sync with head.
 1.94.10.1 08-Mar-2006  elad Adapt to kernel authorization changes.
 1.94.8.3 11-Aug-2006  yamt sync with head
 1.94.8.2 24-May-2006  yamt sync with head.
 1.94.8.1 11-Apr-2006  yamt sync with head
 1.94.6.2 01-Jun-2006  kardel Sync with head.
 1.94.6.1 22-Apr-2006  simonb Sync with head.
 1.94.4.1 09-Sep-2006  rpaulo sync with head
 1.96.2.1 19-Jun-2006  chap Sync with head.
 1.98.6.2 10-Dec-2006  yamt sync with head.
 1.98.6.1 22-Oct-2006  yamt sync with head
 1.98.4.4 09-Feb-2007  ad Sync with HEAD.
 1.98.4.3 30-Jan-2007  ad Remove support for SA. Ok core@.
 1.98.4.2 12-Jan-2007  ad Sync with head.
 1.98.4.1 18-Nov-2006  ad Sync with head.
 1.102.2.1 10-Mar-2007  bouyer Pull up following revision(s) (requested by elad in ticket #407):
sys/kern/kern_verifiedexec.c: patch
sys/uvm/uvm_mmap.c: revision 1.104 via patch
If Veriexec prevents indirect execution of the binary, in addition to just
blocking the mmap() if exec bit is requested, also strip exec bit from
maxprot for further mprotect() calls. Okay joerg@.
 1.105.2.3 17-May-2007  yamt sync with head.
 1.105.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.105.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.108.4.1 11-Jul-2007  mjf Sync with head.
 1.108.2.8 09-Oct-2007  ad Sync with head.
 1.108.2.7 09-Oct-2007  ad Sync with head.
 1.108.2.6 20-Aug-2007  ad Sync with HEAD.
 1.108.2.5 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.108.2.4 08-Jun-2007  ad Sync with head.
 1.108.2.3 13-Apr-2007  ad - Fix a (new) bug where vget tries to acquire freed vnodes' interlocks.
- Minor locking fixes.
 1.108.2.2 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.108.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.112.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.114.8.2 27-Jul-2007  pooka Change unused fflags parameter in VOP_MMAP to prot and pass in
desired vm protection.
 1.114.8.1 27-Jul-2007  pooka file uvm_mmap.c was added on branch matt-mips64 on 2007-07-27 08:26:39 +0000
 1.114.6.2 09-Jan-2008  matt sync with HEAD
 1.114.6.1 06-Nov-2007  matt sync with HEAD
 1.114.4.3 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.114.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.114.4.1 02-Oct-2007  joerg Sync with HEAD.
 1.115.2.1 14-Oct-2007  yamt sync with head.
 1.117.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.117.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.117.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.118.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.118.2.2 26-Dec-2007  ad Sync with head.
 1.118.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.121.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.121.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.122.6.3 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.122.6.2 14-May-2008  wrstuden Per discussion with ad, remove most of the #include <sys/sa.h> lines
as they were including sa.h just for the type(s) needed for syscallargs.h.

Instead, create a new file, sys/satypes.h, which contains just the
types needed for syscallargs.h. Yes, there's only one now, but that
may change and it's probably more likely to change if it'd be difficult
to handle. :-)

Per discussion with matt at n dot o, add an include of satypes.h to
sigtypes.h. Upcall handlers are kinda signal handlers, and signalling
is the header file that's already included for syscallargs.h that
closest matches SA.

This shaves about 3000 lines off of the diff of the branch relative
to the base. That also represents about 18% of the total before this
checkin.

I think this reduction is very good thing.
 1.122.6.1 10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.122.4.7 11-Aug-2010  yamt sync with head.
 1.122.4.6 11-Mar-2010  yamt sync with head
 1.122.4.5 19-Aug-2009  yamt sync with head.
 1.122.4.4 20-Jun-2009  yamt sync with head
 1.122.4.3 30-May-2009  yamt revert the previous, which has been committed to the wrong branch.
 1.122.4.2 30-May-2009  yamt wrap long lines.
 1.122.4.1 04-May-2009  yamt sync with head.
 1.122.2.2 17-Jun-2008  yamt sync with head.
 1.122.2.1 04-Jun-2008  yamt sync with head
 1.126.12.2 23-Jul-2009  jym Sync with HEAD.
 1.126.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.126.8.1 01-Apr-2009  snj Pull up following revision(s) (requested by mrg in ticket #622):
bin/csh/csh.1: revision 1.46
bin/csh/func.c: revision 1.37
bin/ps/print.c: revision 1.111
bin/ps/ps.c: revision 1.74
bin/sh/miscbltin.c: revision 1.38
bin/sh/sh.1: revision 1.92 via patch
external/bsd/top/dist/machine/m_netbsd.c: revision 1.7
lib/libkvm/kvm_proc.c: revision 1.82
sys/arch/mips/mips/cpu_exec.c: revision 1.55
sys/compat/darwin/darwin_exec.c: revision 1.57
sys/compat/ibcs2/ibcs2_exec.c: revision 1.73
sys/compat/irix/irix_resource.c: revision 1.15
sys/compat/linux/arch/amd64/linux_exec_machdep.c: revision 1.16
sys/compat/linux/arch/i386/linux_exec_machdep.c: revision 1.12
sys/compat/linux/common/linux_limit.h: revision 1.5
sys/compat/osf1/osf1_resource.c: revision 1.14
sys/compat/svr4/svr4_resource.c: revision 1.18
sys/compat/svr4_32/svr4_32_resource.c: revision 1.17
sys/kern/exec_subr.c: revision 1.62
sys/kern/init_sysctl.c: revision 1.160
sys/kern/kern_exec.c: revision 1.288
sys/kern/kern_resource.c: revision 1.151
sys/sys/param.h: patch
sys/sys/resource.h: revision 1.31
sys/sys/sysctl.h: revision 1.184
sys/uvm/uvm_extern.h: revision 1.153
sys/uvm/uvm_glue.c: revision 1.136
sys/uvm/uvm_mmap.c: revision 1.128
usr.bin/systat/ps.c: revision 1.32
- - add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.
- - adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.
- - add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)
- - patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)
- - patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.
- - update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)
this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.
tested on i386 and sparc64, build tested on several other platforms.
thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
 1.126.6.1 28-Apr-2009  skrll Sync with HEAD.
 1.132.4.4 31-May-2011  rmind sync with head
 1.132.4.3 05-Mar-2011  rmind sync with head
 1.132.4.2 03-Jul-2010  rmind sync with head
 1.132.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.132.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.133.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.133.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.135.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.139.6.1 18-Feb-2012  mrg merge to -current.
 1.139.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.139.2.1 17-Apr-2012  yamt sync with head
 1.144.10.1 18-May-2014  rmind sync with head
 1.144.6.2 03-Dec-2017  jdolecek update from HEAD
 1.144.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.148.4.2 11-Jan-2015  snj Pull up following revision(s) (requested by chs in ticket #403):
sys/uvm/uvm_mmap.c: revision 1.151
in uvm_mmap_dev(), use the passed-in offset instead of 0.
from Onno van der Linden in PR 49536.
 1.148.4.1 31-Dec-2014  snj Pull up following revision(s) (requested by chs in ticket #363):
common/lib/libprop/prop_kern.c: revision 1.18
sys/arch/mac68k/dev/grf_compat.c: revision 1.27
sys/arch/x68k/dev/grf.c: revision 1.45
sys/external/bsd/drm/dist/bsd-core/drm_bufs.c: revision 1.12
sys/external/bsd/drm2/drm/drm_drv.c: revision 1.12
sys/external/bsd/drm2/drm/drm_vm.c: revision 1.6
sys/external/bsd/drm2/include/linux/mm.h: revision 1.4
sys/kern/vfs_vnops.c: revision 1.192 via patch
sys/rump/librump/rumpkern/vm.c: revision 1.160
sys/sys/file.h: revision 1.78 via patch
sys/uvm/uvm_device.c: revision 1.64
sys/uvm/uvm_device.h: revision 1.13
sys/uvm/uvm_extern.h: revision 1.192
sys/uvm/uvm_mmap.c: revision 1.150 via patch
add a new "fo_mmap" fileops method to allow use of arbitrary uvm_objects for
mappings of file objects. move vnode-specific details of mmap()ing a vnode
from uvm_mmap() to the new vnode-specific vn_mmap(). add new uvm_mmap_dev()
and uvm_mmap_anon() convenience functions for mapping character devices
and anonymous memory, and replace all other calls to uvm_mmap() with those.
use the new fileop in drm2 so that libdrm can use mmap() to map things
like on other platforms (instead of the ioctl that we have used so far).
 1.149.2.8 28-Aug-2017  skrll Sync with HEAD
 1.149.2.7 05-Oct-2016  skrll Sync with HEAD
 1.149.2.6 09-Jul-2016  skrll Sync with HEAD
 1.149.2.5 29-May-2016  skrll Sync with HEAD
 1.149.2.4 22-Apr-2016  skrll Sync with HEAD
 1.149.2.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.149.2.2 22-Sep-2015  skrll Sync with HEAD
 1.149.2.1 06-Apr-2015  skrll Sync with HEAD
 1.162.6.2 11-May-2017  pgoyette Sync with HEAD
 1.162.6.1 02-May-2017  pgoyette Sync with HEAD - tag prg-localcount2-base1
 1.166.2.3 01-Apr-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1815):

sys/uvm/uvm_mmap.c: revision 1.180

mmap(2): If we fail with a hint, try again without it.
`Hint' here means nonzero addr, but no MAP_FIXED or MAP_TRYFIXED.

This is suboptimal -- we could teach uvm_mmap to do a fancier search
using the address as a hint. But this should do for now.

Candidate fix for PR kern/55533.
 1.166.2.2 11-Aug-2019  martin Pull up following revision(s) (requested by maxv in ticket #1332):

sys/uvm/uvm_mmap.c: revision 1.173

Change 'npgs' from int to size_t. Otherwise the 64bit->32bit conversion
could lead to npgs=0, which is not expected. It later triggers a panic
in uvm_vsunlock().

Found by TriforceAFL (Akul Pillai).
 1.166.2.1 02-Nov-2017  snj Pull up following revision(s) (requested by christos in ticket #336):
sys/uvm/uvm_mmap.c: revision 1.167
[syzkaller] Fix for PR #52658 as suggested by riastradh@
The bug was found by Dmitry Vyukov (dvyukov%google.com@localhost)
using syzkaller and was tested by me on a VM running
8.99.5
 1.169.4.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.169.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.169.4.1 10-Jun-2019  christos Sync with HEAD
 1.172.4.2 01-Apr-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1621):

sys/uvm/uvm_mmap.c: revision 1.180

mmap(2): If we fail with a hint, try again without it.
`Hint' here means nonzero addr, but no MAP_FIXED or MAP_TRYFIXED.

This is suboptimal -- we could teach uvm_mmap to do a fancier search
using the address as a hint. But this should do for now.

Candidate fix for PR kern/55533.
 1.172.4.1 21-Oct-2019  martin Pull up following revision(s) (requested by maxv in ticket #355):

sys/uvm/uvm_mmap.c: revision 1.173

Change 'npgs' from int to size_t. Otherwise the 64bit->32bit conversion
could lead to npgs=0, which is not expected. It later triggers a panic
in uvm_vsunlock().
Found by TriforceAFL (Akul Pillai).
 1.174.2.1 29-Feb-2020  ad Sync with head.
 1.175.10.1 01-Aug-2021  thorpej Sync with HEAD.
 1.185.2.1 02-Aug-2025  perseant Sync with HEAD
 1.22 24-Feb-2025  andvar s/architecure/architecture/ and few other typos in comments.
 1.21 27-Nov-2020  yhardy branches: 1.21.24;
uvm_mremap: reference the appropriate backing object.

The previous approach was appropriate for anonymous
memory and device objects, which continue to work in
the same way.

OK: chs@
Fixes: PR 55237
 1.20 23-Feb-2020  ad branches: 1.20.6;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.19 06-May-2017  joerg branches: 1.19.10; 1.19.16;
Extend the mmap(2) interface to allow requesting protections for later
use with mprotect(2), but without enabling them immediately.

Extend the mremap(2) interface to allow duplicating mappings, i.e.
create a second range of virtual addresses references the same physical
pages. Duplicated mappings can have different effective protections.

Adjust PAX mprotect logic to disallow effective protections of W&X, but
allow one mapping W and another X protections. This obsoletes using
temporary files for purposes like JIT.

Adjust PAX logic for mmap(2) and mprotect(2) to fail if W&X is requested
and not silently drop the X protection.

Improve test cases to ensure correct operation of the changed
interfaces.
 1.18 26-Nov-2015  martin branches: 1.18.8;
We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.
 1.17 12-Jun-2011  rmind branches: 1.17.12; 1.17.30;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.16 16-Aug-2010  yamt branches: 1.16.6;
sys_mremap: unwrap a short line
 1.15 02-Aug-2009  yamt branches: 1.15.2; 1.15.4;
- don't reuse a variable for different purposes.
- KNF a bit.
 1.14 02-Aug-2009  yamt - fix extend of unexistent mapping. the problem reported by
Nicolas Joly on current-users@.
- check our reserved entry a little more strictly.
- comments.
 1.13 23-Mar-2009  yamt sys_mremap: whitespace
 1.12 17-Jun-2008  tsutsui branches: 1.12.4; 1.12.10;
Include <sys/sched.h> before <sys/syscallargs.h> for cpuset_t.
 1.11 02-Jun-2008  ad branches: 1.11.2;
Use atomics to maintain v_usecount.
 1.10 02-Jan-2008  ad branches: 1.10.6; 1.10.8; 1.10.10; 1.10.12;
Merge vmlocking2 to head.
 1.9 20-Dec-2007  dsl Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
 1.8 15-Oct-2007  yamt branches: 1.8.4; 1.8.6; 1.8.10;
uvm_mremap: fix alignment check for the easy cases.
 1.7 08-Aug-2007  drochner branches: 1.7.2; 1.7.4;
Round up size arguments as mmap() does.
This is for consistency, and to have semantics similar to Linux --
a Python selftest secceeds now.
 1.6 21-Jul-2007  ad branches: 1.6.4; 1.6.6;
Merge unobtrusive locking changes from the vmlocking branch.
 1.5 17-Jul-2007  joerg branches: 1.5.2;
Add native mremap system call based on the UVM implementation for
Linux compat. Add code to enforce alignment of the new location.
Special thanks to wizd for helping with the man page.
 1.4 21-Feb-2007  thorpej branches: 1.4.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.3 31-Jan-2006  yamt branches: 1.3.4; 1.3.18; 1.3.28;
uvm_mremap: whitespace.
 1.2 23-Jan-2006  yamt uvm_mremap: fix "easy cases".
 1.1 21-Jan-2006  yamt implement compat_linux mremap.
 1.3.28.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.3.18.6 21-Jan-2008  yamt sync with head
 1.3.18.5 27-Oct-2007  yamt sync with head.
 1.3.18.4 03-Sep-2007  yamt sync with head.
 1.3.18.3 26-Feb-2007  yamt sync with head.
 1.3.18.2 21-Jun-2006  yamt sync with head.
 1.3.18.1 31-Jan-2006  yamt file uvm_mremap.c was added on branch yamt-lazymbuf on 2006-06-21 15:12:40 +0000
 1.3.4.2 01-Feb-2006  yamt sync with head.
 1.3.4.1 31-Jan-2006  yamt file uvm_mremap.c was added on branch yamt-uio_vmspace on 2006-02-01 14:52:48 +0000
 1.4.4.4 23-Oct-2007  ad Sync with head.
 1.4.4.3 20-Aug-2007  ad Sync with HEAD.
 1.4.4.2 05-Apr-2007  ad - Put a per-LWP lock around swapin / swapout.
- Replace use of lockmgr().
- Minor locking fixes and assertions.
- uvm_map.h no longer pulls in proc.h, etc.
- Use kpause where appropriate.
 1.4.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.5.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.6.6.2 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.6.6.1 21-Jul-2007  ad file uvm_mremap.c was added on branch matt-mips64 on 2007-07-21 19:21:56 +0000
 1.6.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.6.4.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.7.4.1 18-Oct-2007  yamt sync with head.
 1.7.2.2 09-Jan-2008  matt sync with HEAD
 1.7.2.1 06-Nov-2007  matt sync with HEAD
 1.8.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.8.6.2 26-Dec-2007  ad Sync with head.
 1.8.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.8.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.10.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.10.10.3 09-Oct-2010  yamt sync with head
 1.10.10.2 19-Aug-2009  yamt sync with head.
 1.10.10.1 04-May-2009  yamt sync with head.
 1.10.8.1 04-Jun-2008  yamt sync with head
 1.10.6.2 29-Jun-2008  mjf Sync with HEAD.
 1.10.6.1 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.11.2.1 18-Jun-2008  simonb Sync with head.
 1.12.10.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.12.4.1 28-Apr-2009  skrll Sync with HEAD.
 1.15.4.2 05-Mar-2011  rmind sync with head
 1.15.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.15.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.16.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.17.30.2 28-Aug-2017  skrll Sync with HEAD
 1.17.30.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.17.12.1 03-Dec-2017  jdolecek update from HEAD
 1.18.8.1 11-May-2017  pgoyette Sync with HEAD
 1.19.16.1 29-Feb-2020  ad Sync with head.
 1.19.10.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.20.6.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.21.24.1 02-Aug-2025  perseant Sync with HEAD
 1.25 15-Aug-2020  chs use uint64_t rather than int for storing the index of a page within an object.
 1.24 14-Aug-2020  chs centralize calls from UVM to radixtree into a few functions.
in those functions, assert that the object lock is held in
the correct mode.
 1.23 25-May-2020  ad - Alter the convention for uvm_page_array slightly, so the basic search
parameters can't change part way through a search: move the "uobj" and
"flags" arguments over to uvm_page_array_init() and store those with the
array.

- With that, detect when it's not possible to find any more pages in the
tree with the given search parameters, and avoid repeated tree lookups if
the caller loops over uvm_page_array_fill_and_peek().
 1.22 19-May-2020  ad PR kern/32166: pgo_get protocol is ambiguous
Also problems with tmpfs+nfs noted by hannken@.

Don't pass PGO_ALLPAGES to pgo_get, and ignore PGO_DONTCARE in the
!PGO_LOCKED case. In uao_get() have uvm_pagealloc() take care of page
zeroing and release busy pages on error.
 1.21 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.20 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.19 31-Dec-2019  ad branches: 1.19.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.18 15-Dec-2019  ad Merge from yamt-pagecache:

- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
 1.17 14-Dec-2019  ad Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.
 1.16 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.15 26-Oct-2015  mrg branches: 1.15.18;
disable the previous for now; it fails for me on a different system.
 1.14 26-Oct-2015  mrg in uvm_obj_init(), KASSERT(ops), to ensure we have an actual pager ops
set for this object. suggested by chuq.
 1.13 24-Aug-2015  pooka to garnish, dust with _KERNEL_OPT
 1.12 11-Mar-2014  pooka branches: 1.12.6;
deduplicate uvm_object_printit() implementation
 1.11 27-Aug-2011  christos branches: 1.11.2; 1.11.12; 1.11.16;
Add an optional pglist argument to uvm_obj_wirepages, to be
filled with the list of pages that were wired.
 1.10 18-Jun-2011  rmind - Move pre-check from uvm_obj_destroy() to ubc_purge(), keep it abstracted.
- Add comments noting the race between ubc_alloc() and ubc_purge().
 1.9 12-Jun-2011  mrg include uvm_object.c in the rump kernel for the new uvm_obj* functions.
don't build the uvm_object.c uvm_object_printit() for _RUMPKERNEL. (XXX)
add empty panic() stubs for uvm_loanbreak() and ubc_purge().

fixes some more 5.99.53 rump build issues.
 1.8 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.7 18-Aug-2009  thorpej branches: 1.7.2; 1.7.4; 1.7.10;
Move uvm_object-related DDB hooks into uvm_object.c. Put all of the
uvm_map-related DDB stuff in one spot in the file.
 1.6 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.5 04-Jan-2008  ad branches: 1.5.6; 1.5.8; 1.5.10;
Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.4 02-Jan-2008  ad Merge vmlocking2 to head.
 1.3 17-Feb-2007  rmind branches: 1.3.4; 1.3.18; 1.3.24; 1.3.26; 1.3.30;
Mention rmind@ as an author in the license. No functional change.
 1.2 12-Oct-2006  yamt branches: 1.2.2; 1.2.4; 1.2.8; 1.2.10;
whitespace.
 1.1 12-Oct-2006  yamt uobj_wirepages and uobj_unwirepages from Mindaugas. PR/34771.
(commented out in files.uvm for now because there is no user in tree.)

http://mail-index.netbsd.org/tech-kern/2006/09/24/0000.html
http://mail-index.netbsd.org/tech-kern/2006/10/10/0000.html
 1.2.10.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.2.8.4 21-Jan-2008  yamt sync with head
 1.2.8.3 26-Feb-2007  yamt sync with head.
 1.2.8.2 30-Dec-2006  yamt sync with head.
 1.2.8.1 12-Oct-2006  yamt file uvm_object.c was added on branch yamt-lazymbuf on 2006-12-30 20:51:05 +0000
 1.2.4.2 18-Nov-2006  ad Sync with head.
 1.2.4.1 12-Oct-2006  ad file uvm_object.c was added on branch newlock2 on 2006-11-18 21:39:50 +0000
 1.2.2.2 22-Oct-2006  yamt sync with head
 1.2.2.1 12-Oct-2006  yamt file uvm_object.c was added on branch yamt-splraiseipl on 2006-10-22 06:07:53 +0000
 1.3.30.2 08-Jan-2008  bouyer Sync with HEAD
 1.3.30.1 02-Jan-2008  bouyer Sync with HEAD
 1.3.26.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.3.24.1 18-Feb-2008  mjf Sync with HEAD.
 1.3.18.1 09-Jan-2008  matt sync with HEAD
 1.3.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.5.10.2 19-Aug-2009  yamt sync with head.
 1.5.10.1 16-May-2008  yamt sync with head.
 1.5.8.1 18-May-2008  yamt sync with head.
 1.5.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.7.10.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.7.4.6 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.7.4.5 05-Mar-2011  rmind sync with head
 1.7.4.4 26-Apr-2010  rmind Add ubc_purge() and purge/deassociate any related UBC entries during
object (usually, vnode) destruction. Since locking (and thus object)
is required to enter/remove mappings - object is not allowed anymore
to disappear with any UBC entries left.

From original patch by ad@ with some modifications.
 1.7.4.3 24-Apr-2010  rmind Amend previous.
 1.7.4.2 23-Apr-2010  rmind Use consistent naming - uvm_obj_*().
 1.7.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.7.2.3 12-Jul-2010  uebayasi Reduce more diff by backing out XIP page specific code. Allow XIP pages
to be loaned.
 1.7.2.2 31-May-2010  uebayasi Re-define the definition of "device page"; device pages are pages of
device memory. Pages which don't have vm_page (== can't be used for
generic use), but whose PV are tracked, are called "direct pages" from
now.
 1.7.2.1 12-Feb-2010  uebayasi Teach device page handling.
 1.11.16.1 18-May-2014  rmind sync with head
 1.11.12.2 03-Dec-2017  jdolecek update from HEAD
 1.11.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.11.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.11.2.3 26-Nov-2011  yamt - uvm_page_array_fill: add some more parameters
- uvn_findpages: use gang-lookup
- genfs_putpages: re-enable backward clustering
- mechanical changes after the recent radixtree.h api changes
 1.11.2.2 06-Nov-2011  yamt remove pg->listq and uobj->memq
 1.11.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.12.6.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.12.6.1 22-Sep-2015  skrll Sync with HEAD
 1.15.18.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.19.2.2 29-Feb-2020  ad Sync with head.
 1.19.2.1 17-Jan-2020  ad Sync with head.
 1.40 05-Feb-2024  andvar fix various typos in comments.
 1.39 14-Aug-2020  chs centralize calls from UVM to radixtree into a few functions.
in those functions, assert that the object lock is held in
the correct mode.
 1.38 14-Mar-2020  ad Make uvm_pagemarkdirty() responsible for putting vnodes onto the syncer
work list. Proposed on tech-kern@.
 1.37 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.36 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.35 15-Dec-2019  ad branches: 1.35.2;
Merge from yamt-pagecache:

- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
 1.34 14-Dec-2019  ad Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.
 1.33 14-Sep-2012  rmind branches: 1.33.38;
- Manage anonymous UVM object reference count with atomic ops.
- Fix an old bug of possible lock against oneself (uao_detach_locked() is
called from uao_swap_off() with uao_list_lock acquired). Also removes
the try-lock dance in uao_swap_off(), since the lock order changes.
 1.32 28-Jan-2012  rmind branches: 1.32.2; 1.32.6;
Describe UVM object and explain lock sharing a little.
 1.31 12-Jun-2011  rmind branches: 1.31.2; 1.31.6;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.30 02-Feb-2011  chuck branches: 1.30.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.29 06-Nov-2010  uebayasi branches: 1.29.2; 1.29.4;
Include uvm/uvm_pglist.h for struct pglist.
 1.28 25-Sep-2010  matt Rename rb.h to rbtree.h, as it is more appropriate (c.f. ptree.h). Also
helps find code that hasn't been updated to use the new rbtree API.
 1.27 24-Sep-2010  rmind Fixes/improvements to RB-tree implementation:
1. Fix inverted node order, so that negative value from comparison operator
would represent lower (left) node, and positive - higher (right) node.
2. Add an argument (i.e. "context"), passed to comparison operators.
3. Change rb_tree_insert_node() to return a node - either inserted one or
already existing one.
4. Amend the interface to manipulate the actual object, instead of the
rb_node (in a similar way as Patricia-tree interface does).
5. Update all RB-tree users accordingly.

XXX: Perhaps rename rb.h to rbtree.h, since cleaning-up..

1-3 address the PR/43488 by Jeremy Huddleston.

Passes RB-tree regression tests.
Reviewed by: matt@, christos@
 1.26 04-Jun-2008  ad branches: 1.26.18; 1.26.20;
Replace the global vm_page hash with a per vm_object rbtree.
Proposed on tech-kern@.
 1.25 02-Jun-2008  ad Use atomics to maintain v_usecount.
 1.24 02-Jan-2008  ad branches: 1.24.6; 1.24.8; 1.24.10; 1.24.12;
Merge vmlocking2 to head.
 1.23 01-Dec-2007  yamt branches: 1.23.2; 1.23.6;
constify pagerops.
 1.22 12-Oct-2006  yamt branches: 1.22.8; 1.22.22; 1.22.24; 1.22.30;
move some knowledge about vnode into uvm_vnode.c.
 1.21 11-Dec-2005  christos branches: 1.21.20; 1.21.22;
merge ktrace-lwp.
 1.20 23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.19 17-Jul-2005  yamt ensure that vnodes with dirty pages are always on syncer's queue.

- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).

- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.

fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)

- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).

- add some assertions.
 1.18 06-Jun-2005  yamt branches: 1.18.2;
introduce a macro to initialize uvm_object and use it.
 1.17 29-Nov-2003  yamt branches: 1.17.14;
mincore: don't treat an aobj as a device mapping.
 1.16 20-Jun-2002  chs branches: 1.16.6;
count aobj pages (most notably kernel stack pages) as anon pages
for memory usage-balancing purposes.
 1.15 15-May-2002  matt branches: 1.15.2; 1.15.4;
When core dumping a process, don't dump maps backed up by the device pager.
(move the pagerops externs to uvm_object.h and out the C files).
 1.14 30-Oct-2001  thorpej - Add a new vnode flag VEXECMAP, which indicates that a vnode has
executable mappings. Stop overloading VTEXT for this purpose (VTEXT
also has another meaning).
- Rename vn_marktext() to vn_markexec(), and use it when executable
mappings of a vnode are established.
- In places where we want to set VTEXT, set it in v_flag directly, rather
than making a function call to do this (it no longer makes sense to
use a function call, since we no longer overload VTEXT with VEXECMAP's
meaning).

VEXECMAP suggested by Chuq Silvers.
 1.13 15-Sep-2001  chs branches: 1.13.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.12 26-May-2001  chs branches: 1.12.2; 1.12.4;
replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.
 1.11 09-Mar-2001  chs add UBC memory-usage balancing. we track the number of pages in use for
each of the basic types (anonymous data, executable image, cached files)
and prevent the pagedaemon from reusing a given page if that would reduce
the count of that type of page below a sysctl-setable minimum threshold.
the thresholds are controlled via three new sysctl tunables:
vm.anonmin, vm.vnodemin, and vm.vtextmin. these tunables are the
percentages of pageable memory reserved for each usage, and we do not allow
the sum of the minimums to be more than 95% so that there's always some
memory that can be reused.
 1.10 28-Jan-2001  thorpej branches: 1.10.2;
Put the extern decl of uvm_vnodeops in uvm_object.h
 1.9 28-Jan-2001  thorpej Define a UVM_OBJ_IS_VNODE() macro to test if an object is a vnode.
 1.8 25-May-1999  thorpej branches: 1.8.2;
Define a new kernel object type, "intrsafe", which are used for objects
which can be used in an interrupt context. Use pmap_kenter*() and
pmap_kremove() only for mappings owned by these objects.

Fixes some locking protocol issues related to MP support, and eliminates
all of the pmap_enter vs. pmap_kremove inconsistencies.
 1.7 25-May-1999  thorpej Macro'ize the test for "object is a kernel object".
 1.6 25-Mar-1999  mrg branches: 1.6.4;
remove now >1 year old pre-release message.
 1.5 09-Mar-1998  mrg KNF.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.6.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.8.2.2 12-Mar-2001  bouyer Sync with HEAD.
 1.8.2.1 11-Feb-2001  bouyer Sync with HEAD.
 1.10.2.6 01-Aug-2002  nathanw Catch up to -current.
 1.10.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.10.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.10.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.10.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.10.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.12.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.12.2.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.12.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.12.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.13.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.15.4.1 21-Jun-2002  lukem Pull up revision 1.16 (requested by chs in ticket #329):
count aobj pages (most notably kernel stack pages) as anon pages
for memory usage-balancing purposes.
 1.15.2.1 15-Jul-2002  gehenna catch up with -current.
 1.16.6.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.16.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.16.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.16.6.1 03-Aug-2004  skrll Sync with HEAD
 1.17.14.1 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.18.2.4 21-Jan-2008  yamt sync with head
 1.18.2.3 07-Dec-2007  yamt sync with head
 1.18.2.2 30-Dec-2006  yamt sync with head.
 1.18.2.1 21-Jun-2006  yamt sync with head.
 1.21.22.1 22-Oct-2006  yamt sync with head
 1.21.20.1 18-Nov-2006  ad Sync with head.
 1.22.30.2 18-Feb-2008  mjf Sync with HEAD.
 1.22.30.1 08-Dec-2007  mjf Sync with HEAD.
 1.22.24.1 09-Jan-2008  matt sync with HEAD
 1.22.22.1 03-Dec-2007  joerg Sync with HEAD.
 1.22.8.2 21-Aug-2007  yamt destroy vmobjlock.
 1.22.8.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.23.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.23.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.24.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.24.10.2 09-Oct-2010  yamt sync with head
 1.24.10.1 04-May-2009  yamt sync with head.
 1.24.8.2 17-Jun-2008  yamt sync with head.
 1.24.8.1 04-Jun-2008  yamt sync with head
 1.24.6.1 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.26.20.3 05-Mar-2011  rmind sync with head
 1.26.20.2 26-Apr-2010  rmind Add ubc_purge() and purge/deassociate any related UBC entries during
object (usually, vnode) destruction. Since locking (and thus object)
is required to enter/remove mappings - object is not allowed anymore
to disappear with any UBC entries left.

From original patch by ad@ with some modifications.
 1.26.20.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.26.18.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.26.18.1 28-Apr-2010  uebayasi Don't expose uvm_page.h internal for usual uvm(9) users.
 1.29.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.29.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.30.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.31.6.1 18-Feb-2012  mrg merge to -current.
 1.31.2.5 30-Oct-2012  yamt sync with head
 1.31.2.4 01-Aug-2012  yamt - fix integrity sync.
putpages for integrity sync (fsync, msync with MS_SYNC, etc) should not
skip pages being written back by other threads.

- adapt to radix tree tag api changes.
 1.31.2.3 17-Apr-2012  yamt sync with head
 1.31.2.2 06-Nov-2011  yamt remove pg->listq and uobj->memq
 1.31.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.32.6.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.32.2.1 22-Nov-2012  riz Pull up following revision(s) (requested by rmind in ticket #694):
sys/uvm/uvm_aobj.h: revision 1.22
sys/uvm/uvm_aobj.c: revision 1.117
sys/uvm/uvm_aobj.c: revision 1.118
sys/uvm/uvm_aobj.c: revision 1.119
sys/uvm/uvm_object.h: revision 1.33
- Describe uvm_aobj and the lock order.
- Remove unnecessary uao_dropswap_range1() wrapper.
- KNF. Sprinkle some __cacheline_aligned.
- Manage anonymous UVM object reference count with atomic ops.
- Fix an old bug of possible lock against oneself (uao_detach_locked() is
called from uao_swap_off() with uao_list_lock acquired). Also removes
the try-lock dance in uao_swap_off(), since the lock order changes.
 1.33.38.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.35.2.2 29-Feb-2020  ad Sync with head.
 1.35.2.1 17-Jan-2020  ad Sync with head.
 1.256 05-Mar-2024  thorpej Rename the local "boot_cpu" variable to "uvm_boot_cpu".
 1.255 10-Feb-2024  andvar s/musn't/mustn't/ in comments.
 1.254 23-Sep-2023  ad uvm_phys_to_vm_page() turns out to be a fairly central routine due to the
way that some of the pmaps work, so try to optimise it a little.
 1.253 17-Jul-2023  riastradh uvm(9): One rndsource for faults -- not one per CPU.

All relevant state is per-CPU anyway; the only substantive difference
this makes is how many entries appear in `rndctl -l' output and what
they are called -- formerly the somewhat confusing `cpuN', meaning
`page faults on cpuN', and now just `uvmfault'. I don't think
there's any real value in being able to enable or disable measurement
or counting of page faults on one CPU vs others, so although this
could be a minor compatibility change, it's hard to imagine it
matters much.

XXX kernel ABI change in struct cpu_info
 1.252 09-Apr-2023  riastradh uvm(9): KASSERT(A && B) -> KASSERT(A); KASSERT(B)
 1.251 26-Oct-2022  riastradh ddb/db_active.h: New home for extern db_active.

This can be included unconditionally, and db_active can then be
queried unconditionally; if DDB is not in the kernel, then db_active
is a constant zero. Reduces need for #include opt_ddb.h, #ifdef DDB.
 1.250 20-Dec-2020  skrll Some KNF. NFC.
 1.249 18-Oct-2020  chs branches: 1.249.2;
In the current code, CPU_COUNT_FREEPAGES counts pages in the global
freelists AND the per-CPU pgflcache free pages caches, and that is the
number of pages that the pagedaemon considers to be available.
However, most pages in the pgflcache per-CPU free page caches are NOT
actually available for any particular allocation, and thus allocating
a page can fail even though the pagedaemon thinks enough pages are
available. This change makes CPU_COUNT_FREEPAGES only count pages in
the global freelists and not pages in the pgflcache per-CPU free page
caches, thus better aligning the pagedaemon's view of how many pages
are available with the number of pages that can actually be allocated
by any particular request. This fixes a hang that Christos was hitting.
 1.248 18-Oct-2020  chs Move the handling of PG_PAGEOUT from uvm_aio_aiodone_pages() to
uvm_page_unbusy() so that all callers of uvm_page_unbusy() don't need to
handle this flag separately. Split out the pages part of uvm_aio_aiodone()
into uvm_aio_aiodone_pages() in rump just like in the real kernel.
In ZFS functions that can fail to copy data between the ARC and VM pages,
use uvm_aio_aiodone_pages() rather than uvm_page_unbusy() so that we can
handle these "I/O" errors. Fixes PR 55702.
 1.247 20-Sep-2020  skrll G/C uvm_pagezerocheck
 1.246 15-Aug-2020  tnn add a __diagused to fix non-DIAGNOSTIC kernel
 1.245 14-Aug-2020  chs centralize calls from UVM to radixtree into a few functions.
in those functions, assert that the object lock is held in
the correct mode.
 1.244 09-Jul-2020  skrll Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
 1.243 17-Jun-2020  thorpej <sys/extent.h> not needed here.
 1.242 14-Jun-2020  ad Remove PG_ZERO. It worked brilliantly on x86 machines from the mid-90s but
having spent an age experimenting with it over the last 6 months on various
machines and with different use cases it's always either break-even or a
slight net loss for me.
 1.241 13-Jun-2020  ad uvm_pagerealloc(): resurrect the insertion case.
 1.240 11-Jun-2020  ad Counter tweaks:

- Don't need to count anonpages+filepages any more; clean+unknown+dirty for
each kind of page can be summed to get the totals.

- Track the number of free pages with a counter so that it's one less thing
for the allocator to do, which opens up further options there.

- Remove cpu_count_sync_one(). It has no users and doesn't save a whole lot.
For the cheap option, give cpu_count_sync() a boolean parameter indicating
that a cached value is okay, and rate limit the updates for cached values
to hz.
 1.239 11-Jun-2020  ad uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
 1.238 24-May-2020  ad Add uvm_pagewanted_p(): return true if someone is waiting on the page and
assert caller has correct lock to observe that.
 1.237 19-May-2020  ad UVM_PAGE_TRKOWN: print the LID too
 1.236 17-May-2020  ad Don't set PG_AOBJ on a page unless UVM_OBJ_IS_AOBJ(), otherwise it can
catch pages from e.g. uvm_loanzero_object.
 1.235 17-May-2020  ad - If the hardware provided NUMA info, then use it to decide how to set up
the allocator's buckets, instead of doing round robin distribution. There
are open questions here but this is better than doing nothing.

- Kernel reserve pages are for the kernel not realtime threads.
 1.234 17-Mar-2020  ad Tweak the March 14th change to make page waits interlocked by pg->interlock.
Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude
possible bugs.
 1.233 15-Mar-2020  rin Fix build with UVMHIST.
 1.232 14-Mar-2020  ad Don't require a write lock for page enqueue/activate/deactivate.
 1.231 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.230 03-Mar-2020  skrll Trailing whitespace
 1.229 03-Mar-2020  skrll Typo in comment
 1.228 27-Feb-2020  ad Tighten up the locking around vp->v_iflag a little more after the recent
split of vmobjlock & v_interlock.
 1.227 23-Feb-2020  ad Fix a comment.
 1.226 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.225 21-Jan-2020  ad uvmpdpol_pageactive(): the change to not re-activate recently activated
pages worked great with uvm_pageqlock, but it doesn't buy anything any more,
because now the busy pages are likely in a per-CPU queue somewhere waiting
to be processed, and changing the intent on those queued pages costs next
to nothing. Remove this and get back all the bits in pg->pqflags.
 1.224 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.223 11-Jan-2020  ad - uvm_pagezerocheck(): put a global lock around it to protect the single
page mapping (DEBUG only).

- uvm_pagefree(): increment zeropages as needed.
 1.222 09-Jan-2020  ad - Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).
 1.221 05-Jan-2020  ad branches: 1.221.2;
Page allocator:

The method for assigning pages to buckets in the non-NUMA case sucks. It
can defeat memory interleaving in the hardware, and not distribute pages
fairly by colour. To fix this and make things more deterministic, take the
physical PFN and colour into account.

Then when freeing pages, in the non-NUMA case don't change the page's bucket
either. Keeping the bucket number stable will also permit partitioning page
replacement state by CPU package / NUMA node.
 1.220 31-Dec-2019  ad - Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.219 31-Dec-2019  ad Rename uvm_free() -> uvm_availmem().
 1.218 31-Dec-2019  ad Rename uvm_page_locked_p() -> uvm_page_owner_locked_p()
 1.217 30-Dec-2019  ad uvm_pagealloc_pgb(): don't fill cache if we're into the reserves.

uvm_pagereplace(): use radix_tree_replace_node() to avoid alloc/free.
 1.216 28-Dec-2019  ad Add missing call to uvm_pgflcache_resume().
 1.215 28-Dec-2019  martin Use PRIxPADDR to print a physical address (instead of casting to void*
and printing a pointer - which does not work well if sizeof(paddr_t) !=
sizeof(void*)).
 1.214 27-Dec-2019  ad Nothing uses uvm.cpus any more, and we can do the same with cpu_lookup(),
so get rid of it.
 1.213 27-Dec-2019  ad Redo the page allocator to perform better, especially on multi-core and
multi-socket systems. Proposed on tech-kern. While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
 1.212 22-Dec-2019  ad uvm_pagealloc_strat(): Tweak the locking to allow for lazy dequeue of pages
in the pdpolicy code. This means taking pg->interlock if assigning to
an object. The remaining barrier to lazy dequeue is having a dedicated
TAILQ_ENTRY in the page (it's currently shared with the page allocator).
 1.211 21-Dec-2019  ad uvm_page_to_phys: mask off the lower bits.
 1.210 21-Dec-2019  ad Detangle the pagedaemon from uvm_fpageqlock:

- Have a single lock (uvmpd_lock) to protect pagedaemon state that was
previously covered by uvmpd_pool_drain_lock plus uvm_fpageqlock.
- Don't require any locks be held when calling uvm_kick_pdaemon().
- Use uvm_free().
 1.209 21-Dec-2019  ad - Rename VM_PGCOLOR_BUCKET() to VM_PGCOLOR(). I want to reuse "bucket" for
something else soon and TBH it matches what this macro does better.

- Add inlines to set/get locator values in the unused lower bits of
pg->phys_addr. Begin by using it to cache the freelist index, because
computing it is expensive and that shows up during profiling. Discussed
on tech-kern.
 1.208 21-Dec-2019  ad Counter tweaks:

"zeroaborts" + "free" don't need to be per-CPU counters, and "bucketmiss"
wasn't used. Remove those and cluster by usage.
 1.207 21-Dec-2019  ad Add uvm_free(): returns number of free pages in system.
 1.206 18-Dec-2019  ad PR kern/54783: t_mmap crahes the kernel

- Fix various locking & sequencing errors with breaking loans.

- Don't call uvm_pageremove_tree() while holding pg->interlock as radixtree
can take further locks when freeing nodes.
 1.205 16-Dec-2019  ad - Extend the per-CPU counters matt@ did to include all of the hot counters
in UVM, excluding uvmexp.free, which needs special treatment and will be
done with a separate commit. Cuts system time for a build by 20-25% on
a 48 CPU machine w/DIAGNOSTIC.

- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
 1.204 16-Dec-2019  ad Merge from yamt-pagecache:

uvm_pagerealloc(): Don't bother with insert to new. Nobody uses it and it
can return an error now due to radixtree.
 1.203 15-Dec-2019  ad Merge from yamt-pagecache:

- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
 1.202 14-Dec-2019  ad Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress. radixtree is the intended
replacement.

Ok yamt@.
 1.201 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.200 20-Sep-2019  maxv Fix programming mistake: 'paddrp' is a pointer given as argument, setting
it to NULL in the called function does not set it to NULL in the caller.

Actually, the callers of these functions do not do anything with the
special error handling, so drop the unused checks and the NULL assignments
altogether.

Found by the lgtm bot.
 1.199 14-Mar-2019  kre branches: 1.199.4;
Avoid a panic from the sequence

mlock(buf, 0);
munlock(buf, 0);
mlock(buf, page);
munlock(buf, page);

where buf is page aligned, and page is actually anything > 0
(but not too big) which will get rounded up to the next multiple
of the page size.

In that sequence, it is possible that the 1st munlock() is optional.

Add a KASSERT() (or two) to detect the first effects of the problem
(without that, or in !DIAGNOSTIC kernels) the problem eventually
causes some kind of problem or other (most often still a panic.)

After this, mlock(anything, 0) (or munlock) validates "anything"
but is otherwise a no-op (regardless of the alignment of anything).

Also, don't treat mlock(buf, verybig) as equivalent to mlock(buf, 0)
which is (more or less) what we had been doing.

XXX pullup -8 (maybe -7 as well, need to check).
 1.198 19-May-2018  jdolecek branches: 1.198.2;
add experimental new function uvm_direct_process(), to allow of read/writes
of contents of uvm pages without mapping them into kernel, using
direct map or moral equivalent; pmaps supporting the interface need
to provide pmap_direct_process() and define PMAP_DIRECT

implement the new interface for amd64; I hear alpha and mips might be relatively
easy to add too, but I lack the knowledge

part of resolution for PR kern/53124
 1.197 19-May-2018  jdolecek detect wraparound when bumping page wire_count and loan_count
 1.196 24-Apr-2018  jakllsch In uvm_page_recolor(), kmem_free() old size rather than new size.

From Yaniv Abraham-Rabinovitch in PR kern/53208.
 1.195 02-Dec-2017  mrg branches: 1.195.2;
add two new members to uvmexp_sysctl{}: bootpages and poolpages.
bootpages is set to the pages allocated via uvm_pageboot_alloc().
poolpages is calculated from the list of pools nr_pages members.

this brings us closer to having a valid total of pages known by
the system, vs actual pages originally managed.

XXX: poolpages needs some handling for PR_RECURSIVE pools still.
 1.194 28-Oct-2017  pgoyette Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.193 01-Jun-2017  chs branches: 1.193.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.192 05-Feb-2017  maya Fix off by one.
ok cherry
 1.191 23-Dec-2016  skrll branches: 1.191.2;
Fix uvm_page_physget_freelist so that it actually performs the two passes
it mentions.
 1.190 23-Dec-2016  cherry "Make NetBSD great again!"

Introduce uvm_hotplug(9) to the kernel.

Many thanks, in no particular order to:

TNF, for funding the project.

Chuck Silvers - for multiple API reviews and feedback.
Nick Hudson - for testing on multiple architectures and bugfix patches.
Everyone who helped with boot testing.

KeK (http://www.kek.org.in) for hosting the primary developers.
 1.189 22-Dec-2016  cherry physmem should be of type psize_t

Also, use PRIxPSIZE when printf(9)ing physmem.
 1.188 22-Dec-2016  cherry Use uvm_physseg.h:uvm_page_physload() instead of uvm_extern.h

For this, include uvm_physseg.h in the build and include tree, make a
cosmetic modification to the prototype for uvm_page_physload().
 1.187 11-Apr-2015  joerg branches: 1.187.2;
Allow changing the per-cpu emergency page reservation via kernel config.
 1.186 05-Sep-2014  matt branches: 1.186.2;
Don't use C++ try keyword as a variable name.
 1.185 10-Aug-2014  tls Merge tls-earlyentropy branch into HEAD.
 1.184 21-Apr-2014  chs remove unused variables for UVM_PAGE_TRKOWN.
 1.183 25-Oct-2013  martin branches: 1.183.2;
Mark a diagnostic-only variable
 1.182 16-Feb-2012  matt branches: 1.182.2; 1.182.4;
Add KASSERTs to uvm_pagealloc_pgfl to verify the page is actually free and has
the contents that it should.
Redo the KASSERTs for the pageq in uvm_pagefree.
 1.181 02-Feb-2012  tls Entropy-pool implementation move and cleanup.

1) Move core entropy-pool code and source/sink/sample management code
to sys/kern from sys/dev.

2) Remove use of NRND as test for presence of entropy-pool code throughout
source tree.

3) Remove use of RND_ENABLED in device drivers as microoptimization to
avoid expensive operations on disabled entropy sources; make the
rnd_add calls do this directly so all callers benefit.

4) Fix bug in recent rnd_add_data()/rnd_add_uint32() changes that might
have lead to slight entropy overestimation for some sources.

5) Add new source types for environmental sensors, power sensors, VM
system events, and skew between clocks, with a sample implementation
for each.

ok releng to go in before the branch due to the difficulty of later
pullup (widespread #ifdef removal and moved files). Tested with release
builds on amd64 and evbarm and live testing on amd64.
 1.180 28-Jan-2012  matt Replace locking checks with uvm_page_locked_p.
 1.179 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.178 06-Oct-2011  uebayasi branches: 1.178.2; 1.178.6;
Correct pagermap emergva allocation. From yamt@.

Tested by building i386 kernel with DTRACE defined which died 100%.
 1.177 30-Sep-2011  mrg re-arrange the end of uvm_page_recolor() to avoid the multiple exit
points. move the call to uvm_pager_realloc_emerg() to after we
drop the uvm_fpageqlock, since it may be taken again in uvm_km_alloc().

fixes LOCKDEBUG crashes with the previous change.
 1.176 28-Sep-2011  matt Reallocate emergency pager va when ncolors is increased. (modication of
patch from mrg).
 1.175 15-Jun-2011  rmind uvm_pagealloc_strat: fix diagnostic assert. Reported by drochner@.
 1.174 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.173 05-Jun-2011  matt Fix fencepost error.
 1.172 01-Apr-2011  rmind branches: 1.172.2;
uvm_pageidlezero: use try-lock to not occupy uvm_fpageqlock, which may
be on demand by other CPUs. Reduces lock contention in some workloads
on many CPU (8+) systems.

Tested by tls@.
 1.171 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.170 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.
 1.169 04-Jan-2011  matt branches: 1.169.2; 1.169.4;
Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.
 1.168 11-Dec-2010  matt When panicing due a non-power of 2 pagesize, include the pagesize in the
panic message.
 1.167 25-Nov-2010  uebayasi Revert vm_physseg allocation changes. A report says that it causes
panics when used with mplayer in heavy load.
 1.166 14-Nov-2010  uebayasi ... and another.
 1.165 14-Nov-2010  uebayasi Fix build caused by a last minute change.
 1.164 14-Nov-2010  uebayasi Be a little more friendly to dynamic physical segment registration.

Maintain an array of pointer to struct vm_physseg, instead of struct
array. So that VM subsystem can take its pointer safely. Pointer
to this struct will replace raw paddr_t usage in the future.

Dynamic removal is not supported yet.

Only MD data structure changes, no kernel bump needed.

Tested on i386, amd64, powerpc/ibm40x, arm11.
 1.163 12-Nov-2010  uebayasi Abstraction fix; move physical address -> per-page metadata (struct
vm_page *) "reverse" lookup code from uvm_page.h to uvm_page.c, to
help migration to not do that.

Likewise move per-page metadata (struct vm_page *) -> physical
address "forward" conversion code into *.c too. This is called
only low-layer VM and MD code.
 1.162 12-Nov-2010  uebayasi Abstraction fix; move physical address -> physical segment "reverse"
lookup code from uvm_page.h to uvm_page.c.

This code is used by some pmaps to lookup per-page state (PV) from
per-segment metadata (struct vm_physseg). This is not needed if
UVM looks up physical segment once in fault handler, then directly
passes it to pmap. This change helps transition to that model.

The only users of vm_physseg_find() are pmap_motorola.c and
powerpc/ibm4xx/pmap.c.

Tested By: Compiling and running powerpc/ibm4xx/pmap.c
(evbppc/conf/OPENBLOCKS266)
 1.161 11-Nov-2010  uebayasi C style; make a sentinel pointer have an exclusive value; no
functional changes.
 1.160 11-Nov-2010  uebayasi Typo in a comment.
 1.159 11-Nov-2010  uebayasi Minor clean up.
 1.158 11-Nov-2010  uebayasi Minor clean up.
 1.157 06-Nov-2010  uebayasi Remove incomplete, never worked dynamic run-time memory registration
(uvm_page_physload(9)). This functionality will be re-added later.
 1.156 24-Sep-2010  rmind Fixes/improvements to RB-tree implementation:
1. Fix inverted node order, so that negative value from comparison operator
would represent lower (left) node, and positive - higher (right) node.
2. Add an argument (i.e. "context"), passed to comparison operators.
3. Change rb_tree_insert_node() to return a node - either inserted one or
already existing one.
4. Amend the interface to manipulate the actual object, instead of the
rb_node (in a similar way as Patricia-tree interface does).
5. Update all RB-tree users accordingly.

XXX: Perhaps rename rb.h to rbtree.h, since cleaning-up..

1-3 address the PR/43488 by Jeremy Huddleston.

Passes RB-tree regression tests.
Reviewed by: matt@, christos@
 1.155 25-Apr-2010  ad Reduce memory spent on bookkeeping for large values of MAXCPUS.
 1.154 24-Feb-2010  jym branches: 1.154.2;
- Use ctob() instead of ptoa() to obtain physical addresses from frame
numbers. Using ptoa() will cast to vaddr_t, which might no be adequate
for architectures where sizeof(paddr_t) > sizeof(vaddr_t) (like i386 PAE).

- small fix inside AGP heuristics to avoid masking high order bits for
systems with more than 4GB.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/22/msg007373.html
 1.153 27-Jan-2010  uebayasi branches: 1.153.2;
uvm_pageinsert, uvm_pageremove: Pass the uboj, to/from which a pg is
inserted/removed, as an argument, because looking up a back-reference from
pg is redundant. No functional changes.
 1.152 07-Nov-2009  cegger Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.151 18-Aug-2009  thorpej Move uvm_page-related DDB hooks into uvm_page.c.
 1.150 18-Aug-2009  thorpej Add a real API for testing if a page is a managed page, and adjust callers
to stop relying on vm_physseg_find() for this purpose.
 1.149 11-Aug-2009  matt Fix brain fart. physmem was int not long.
 1.148 11-Aug-2009  matt Add back declaration of physmem but use the existing type (long).
 1.147 11-Aug-2009  haad Remove physmem definition to uintptr_t from another patch.
 1.146 10-Aug-2009  haad Add uvm_reclaim_hooks support for reclaiming kernel KVA space and memory.
This is used only by zfs where uvm_reclaim hook is added from arc cache.

Oked ad@.
 1.145 12-Mar-2009  abs Clarify free_list usage in uvm_page_physload() regarding faster/slower RAM.
Slower RAM should be assigned a higher free_list id.
No functional change to code, just comments and manpage
 1.144 27-Feb-2009  drochner oops - missed a case with PMAP_PAGEIDLEZERO if md code aborts the
zeroing process, from Nicolas Joly
 1.143 26-Feb-2009  drochner -fix two conditions where PQ_FREE was still/already set while the page
was not anymore/yet on the freelist and uvm_fpageqlock was not held
-clear PQ_FREE while the page is in the works of pageidlezero
This avoids that the DMA memory allocator (pglistalloc) grabs a page
which is not on the freelist, leading to a diagnostic panic (with DEBUG)
or freelist corruption. (mostly on X server activation after a VT
switch or suspend/resume because this can allocate megabytes of AGP
memory)
This might fix PR port-i386/38989 by Alan Barrett (in case this was
a multiprocessor).
 1.142 16-Jan-2009  yamt branches: 1.142.2;
uvm_page_unbusy: add an assertion
 1.141 13-Dec-2008  ad It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:

- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap
 1.140 04-Jul-2008  ad branches: 1.140.4; 1.140.6;
Scale the number of kernel reserve pages by the number of CPUs.
 1.139 02-Jul-2008  ad uvm_pageidlezero: fix a broken test which made it give up too easily.
 1.138 02-Jul-2008  matt Switch from KASSERT to CTASSERT for those asserts testing sizes of types.
 1.137 30-Jun-2008  matt Change tree op members/typedefs to rbto_compare_* from rb_compare_*
 1.136 17-Jun-2008  yamt - uvm_pagereplace: don't try to insert multiple pages with the same offset
into uvm_object rbtree.
- inline static -> static inline
 1.135 05-Jun-2008  he branches: 1.135.2;
Delete what appears to be a spurious assignment to an undeclared
'cpu' variable added in revision 1.133. Restores buildability for this file.
 1.134 04-Jun-2008  ad Replace the global vm_page hash with a per vm_object rbtree.
Proposed on tech-kern@.
 1.133 04-Jun-2008  ad - vm_page: put listq, pageq into a union alongside a LIST_ENTRY, so we can
use both types of list.

- Make page coloring and idle zero state per-CPU.

- Maintain per-CPU page freelists. When freeing, put pages onto the local
CPU's lists and the global lists. When allocating, prefer to take pages
from the local CPU. If none are available take from the global list as
done now. Proposed on tech-kern@.
 1.132 02-Jun-2008  ad uvm_pageidlezero:

- Use high and low water marks to try and reduce power consumption.
- Do trylock on uvm_fpageqlock, and bail if we can't get it.
- Only run on one CPU at a time.
 1.131 24-Mar-2008  yamt branches: 1.131.2; 1.131.4; 1.131.6;
remove a redundant pmap_update and add a comment instead.
 1.130 27-Feb-2008  ad Assert uvm_fpageqlock is held in a few more places.
 1.129 23-Feb-2008  chris Add some more missing pmap_update()s following pmap_kremove()s.
 1.128 13-Jan-2008  yamt branches: 1.128.2; 1.128.6;
unwrap short lines.
 1.127 02-Jan-2008  ad Merge vmlocking2 to head.
 1.126 29-Nov-2007  ad branches: 1.126.2; 1.126.6;
Use atomics to maintain uvmexp.{anon,exec,file}pages.
 1.125 08-Oct-2007  ad branches: 1.125.4;
Fix merge error.
 1.124 08-Oct-2007  ad Pad the hashlocks to 32-byte boundaries.
 1.123 21-Jul-2007  ad branches: 1.123.4; 1.123.6; 1.123.8; 1.123.10;
Merge unobtrusive locking changes from the vmlocking branch.
 1.122 09-Jul-2007  ad branches: 1.122.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.121 17-May-2007  yamt merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
 1.120 14-Apr-2007  perseant Track lwp as well as proc owner with UVM_PAGE_TRKOWN
 1.119 22-Feb-2007  thorpej branches: 1.119.4; 1.119.6;
TRUE -> true, FALSE -> false
 1.118 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.117 09-Feb-2007  ad branches: 1.117.2;
Merge newlock2 to head.
 1.116 21-Dec-2006  yamt merge yamt-splraiseipl branch.

- finish implementing splraiseipl (and makeiplcookie).
http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html
- complete workqueue(9) and fix its ipl problem, which is reported
to cause audio skipping.
- fix netbt (at least compilation problems) for some ports.
- fix PR/33218.
 1.115 15-Dec-2006  yamt put ->K loaned pages on the page queue, so that page loaning doesn't
disturb pagedaemon/pdpolicy.
 1.114 27-Sep-2006  thorpej Don't inline uvm_pagealloc_pgfl().
 1.113 15-Sep-2006  yamt branches: 1.113.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.112 13-Apr-2006  yamt branches: 1.112.8;
uvm_page_own: more assertions.
 1.111 12-Feb-2006  yamt branches: 1.111.2; 1.111.4; 1.111.6;
uvm_pageunwire: use uvm_pageactivate rather than a copy.
 1.110 11-Feb-2006  yamt remove the following options. no objections on tech-kern@.

UVM_PAGER_INLINE
UVM_AMAP_INLINE
UVM_PAGE_INLINE
UVM_MAP_INLINE
 1.109 24-Dec-2005  perry branches: 1.109.2; 1.109.4; 1.109.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.108 21-Dec-2005  yamt make length of inactive queue tunable by sysctl. (vm.inactivepct)
 1.107 11-Dec-2005  christos merge ktrace-lwp.
 1.106 28-Jun-2005  thorpej branches: 1.106.2;
Clean up the cpp macro used to say "we're compiling this specific C file".
 1.105 27-Jun-2005  thorpej Use ANSI function decls.
 1.104 04-Jun-2005  chs adapt to const changes.
 1.103 11-May-2005  yamt allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.102 01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.101 23-Oct-2004  yamt branches: 1.101.4; 1.101.6;
uvm_pageidlezero: grab kernel_lock before uvm.fpageqlock. PR/27259.
 1.100 17-Sep-2004  yamt make free page queue filo rather than fifo.
data in pages freed more recently are more likely on cpu cache.
 1.99 01-Sep-2004  yamt uvm_pagefree: when orphaning an A->K loaned page,
- decrement uvmexp.anonpages as it's no longer an anon page.
- null out anon->u.an_page as the anon no longer own the page.
uvm_anfree: add related assertions.
 1.98 05-May-2004  yamt uvm_page_unbusy: add assertions and comments about PG_RELEASED anon pages.
 1.97 24-Mar-2004  junyoung branches: 1.97.2;
- Nuke __P().
- Drop trailing spaces.
 1.96 13-Feb-2004  yamt when breaking a loan from uobj,
insert the replacement page into the same position
as the original page on the object memq so that
genfs_putpages (and lfs) won't be confused.

noted by Stephan Uphoff (PR/24328)
 1.95 13-Feb-2004  wiz Uppercase CPU, plural is CPUs.
 1.94 14-Jan-2004  yamt bump vnode hold count for page cache as well
to resolve unfairness between page cache and traditional buffer cache.
pointed by enami tsugutomo on current-users@.
 1.93 21-Dec-2003  simonb No need to break a line - the full line is less than 80 chars long.
 1.92 05-Nov-2003  yamt add a missing pmap_update().
 1.91 03-Nov-2003  yamt add a DEBUG check if freed PG_ZERO pages are really zero-filled.
 1.90 01-Nov-2003  yamt in uvm_pagefree and friends, if freed pages have been marked by
PG_ZERO flag, put them to PGFL_ZEROS queue rather than default one
so that we can re-use zero-filled pages efficiently.
 1.89 01-Jun-2003  wiz branches: 1.89.2;
Fix typo in panic message. From miod@openbsd.
 1.88 10-May-2003  thorpej Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.
 1.87 08-May-2003  thorpej Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).
 1.86 22-Apr-2003  yamt correct accounting of {exec,file}pages.
they are not updated correctly when breaking loan.
 1.85 09-Apr-2003  thorpej Tweak the way the pagesize-related variables are set:
* Remove DEFAULT_PAGE_SIZE. We don't use PAGE_SIZE the way Mach did.
* In uvm_setpagesize(), if we are called with uvmexp.pagesize == 0,
then assert that PAGE_SIZE != 0 (i.e. a constant), and set uvmexp.pagesize
accordingly.
* Provide defaults for MIN_PAGE_SIZE and MAX_PAGE_SIZE if not defined
by <machine/vmparam.h>. If PAGE_SIZE is not a constant, MIN_PAGE_SIZE
and MAX_PAGE_SIZE must be provided.
* If MIN_PAGE_SIZE and MAX_PAGE_SIZE are not equal (i.e. PAGE_SIZE may
not be a constant in all configurations), then ensure that PAGE_SIZE
and friends expand to variable references for LKMs.
 1.84 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.83 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.82 27-Jan-2003  enami uvm_page_unbusy should skip PGO_DONTCARE page; e.g., locked pgo_getpages
request may contain PGO_DONTCARE and nfs_getpages may unbusy them on error.

Fix is provided in PR#20028 by YAMAMOTO Takashi. (and same one is approved
by chuq while ago in private mail). It was my fault to forget to commit.
 1.81 09-Nov-2002  thorpej Fix signed/unsigned comparison warnings.
 1.80 30-Oct-2002  simonb Fix whitespace bogon.
 1.79 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.78 20-Jun-2002  chs count aobj pages (most notably kernel stack pages) as anon pages
for memory usage-balancing purposes.
 1.77 19-Jun-2002  wrstuden Fix recent bugs seen on Performa 4400 macppc's by
Makoto Fujiwara <makoto@ki.nu> and Manuel Bouyer <bouyer@netbsd.org>.
Help from Allen Briggs, Jason Thorpe, and Matt Thomas.

We need to call cpu_cache_probe() early in boot (machdep.c).
Add 603 info for completeness, and use NBPG not PAGESIZE, as the
latter relies on uvm being setup (cpu_subr.c).
Let uvm_page_recolor() be called before uvm has been set up; just
note the page coloring value (uvm_page.c).
 1.76 29-May-2002  enami Add missing pageq lock while uvm_pagefree() is called (either directly
or indirectly). Reviewed by chuq.
 1.75 15-May-2002  enami branches: 1.75.2; 1.75.4;
When loaned page become ownerless as a result of freeing, it should be
dequeue'ed from pageq. Fix provided by chuq.
 1.74 20-Feb-2002  enami branches: 1.74.4;
In the function uvm_page_own(), clear owner_tag after assertion so that
we can see the owner when assertion failed. Some indentation fix while
I'm here.
 1.73 31-Dec-2001  chs fix locking for loaning. in general we should be looking at the page's
uobject and uanon pointers rather than at the PQ_ANON flag to determine
which lock to hold, since PQ_ANON can be clear even when the anon's lock
is the one which we should hold (if the page was loaned from an object
and then freed by the object).
 1.72 09-Dec-2001  chs add {anon,file,exec}max as a upper bound on the amount of memory that
will be allocated for the respective usage types when there is contention
for memory.

replace "vnode" and "vtext" with "file" and "exec" in uvmexp field names
and sysctl names.
 1.71 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.70 06-Nov-2001  chs several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.
 1.69 06-Nov-2001  simonb Change some unsigned int variables and parameters to plain ints so
that all usages of those agree on unsigned vs. signed.
 1.68 28-Sep-2001  chs branches: 1.68.2;
don't depend on other headers to include sys/proc.h for us.
 1.67 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.66 10-Sep-2001  chris Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.
 1.65 27-Jun-2001  thorpej branches: 1.65.2; 1.65.4;
Since a page can be on only one of ACTIVE or INACTIVE queues at
any given time, turn two consecutive if statements into an if-else-if
construct.
 1.64 27-Jun-2001  thorpej Macro'ize the code that checks the free and inactive thresholds and
wakes the pagedaemon.
 1.63 26-May-2001  chs replace vm_page_t with struct vm_page *.
 1.62 25-May-2001  chs remove trailing whitespace.
 1.61 22-May-2001  ross Merge the swap-backed and object-backed inactive lists.
 1.60 02-May-2001  thorpej Support dynamic sizing of the page color bins. We also support
dynamically re-coloring pages; as machine-dependent code discovers
the size of the system's caches, it may call uvm_page_recolor() with
the new number of colors to use. If the new mumber of colors is
smaller (or equal to) the current number of colors, then uvm_page_recolor()
is a no-op.

The system defaults to one bucket if machine-dependent code does not
initialize uvmexp.ncolors before uvm_page_init() is called.

Note that the number of color bins should be initialized to something
reasonable as early as possible -- for many early memory allocations,
we live with the consequences of the page choice for the lifetime of
the boot.
 1.59 01-May-2001  thorpej Add the number of page colors to uvmexp.
 1.58 01-May-2001  enami Use simple do {} while () loop instead of for {} loop + extra test/variable.
 1.57 01-May-2001  enami Fix second level indentation in recent commit.
 1.56 01-May-2001  thorpej Per discussion w/ chuck and chuck, restructure the md page stuff
to use a structure called "vm_page_md", and use __HAVE_VM_PAGE_MD
and __HAVE_PMAP_PHYSSEG.
 1.55 29-Apr-2001  thorpej Add a VM_MDPAGE_MEMBERS macro that defines pmap-specific data for
each vm_page structure. Add a VM_MDPAGE_INIT() macro to init this
data when pages are initialized by UVM. These macros are mandatory,
but ports may #define them to nothing if they are not needed/used.

This deprecates struct pmap_physseg. As a transitional measure,
allow a port to #define PMAP_PHYSSEG so that it can continue to
use it until its pmap is converted to use VM_MDPAGE_MEMBERS.

Use all this stuff to eliminate a lot of extra work in the Alpha
pmap module (it's smaller and faster now). Changes to other pmap
modules will follow.
 1.54 29-Apr-2001  thorpej Implement page coloring, using a round-robin bucket selection
algorithm (Solaris calls this "Bin Hopping").

This implementation currently relies on MD code to define a
constant defining the number of buckets. This will change
reasonably soon (MD code will be able to dynamically size
the bucket array).
 1.53 24-Apr-2001  thorpej Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.
 1.52 22-Apr-2001  thorpej Make pmap_virtual_space() a required pmap function, even on platforms
which have pmap_steal_memory(). This is to reduce the API differences
between pmaps that implement pmap_steal_memory() and pmaps which do
not.

Note that pmap_steal_memory() needs to adjust *vstartp and/or
*vendp only if it used addresses within the range provided to UVM
via the pmap_virtual_space() call. I.e. it is not necessary to do
so in any current pmap_steal_memory() implementation.
 1.51 09-Mar-2001  chs add UBC memory-usage balancing. we track the number of pages in use for
each of the basic types (anonymous data, executable image, cached files)
and prevent the pagedaemon from reusing a given page if that would reduce
the count of that type of page below a sysctl-setable minimum threshold.
the thresholds are controlled via three new sysctl tunables:
vm.anonmin, vm.vnodemin, and vm.vtextmin. these tunables are the
percentages of pageable memory reserved for each usage, and we do not allow
the sum of the minimums to be more than 95% so that there's always some
memory that can be reused.
 1.50 28-Jan-2001  thorpej branches: 1.50.2;
Put the extern decl of uvm_vnodeops in uvm_object.h
 1.49 28-Jan-2001  thorpej Use UVM_OBJ_IS_VNODE().
 1.48 23-Jan-2001  thorpej Sprinkle some assertions:
amap_free(): Assert that the amap is locked.
amap_share_protect(): Assert that the amap is locked.
amap_wipeout(): Assert that the amap is locked.
uvm_anfree(): Assert that the anon has a reference count of 0 and is
not locked.
uvm_anon_lockloanpg(): Assert that the anon is locked.
anon_pagein(): Assert that the anon is locked.
uvmfault_anonget(): Assert that the anon is locked.
uvm_pagealloc_strat(): Assert that the uobj or the anon is locked

And fix the problems these have uncovered:
amap_cow_now(): Lock the new anon after allocating it, and unref and
unlock it (rather than lock!) before freeing it in case
of an error condition. This should fix a problem reported
by Dan Carosone using cdrecord on an i386 MP kernel.
uvm_fault(): Case1B -- Lock the new anon afer allocating it, and unlock
it later when we unlock the old anon.
Case2 -- Lock the new anon after allocating it, and unlock
it later by passing it to uvmfault_unlockall() (we set anon
to NULL if we're not doing a promote fault).
 1.47 14-Jan-2001  thorpej splimp() -> splvm()
 1.46 01-Dec-2000  chs make sure that pages are on an paging queue before unlocking them.
 1.45 30-Nov-2000  simonb Move uvm_pgcnt_vnode and uvm_pgcnt_anon into uvmexp (as vnodepages and
anonpages), and add vtextpages which is currently unused but will be
used to trace the number of pages used by vtext vnodes.
 1.44 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.43 09-Nov-2000  christos Give a hint to the user on why we failed.
 1.42 05-Oct-2000  mrg s/vm/uvm/ in a bunch of error messages.
 1.41 21-Sep-2000  thorpej Make PMAP_PAGEIDLEZERO() return a boolean value. FALSE indidcates
that the page being zero'd was not completed and that page zeroing
should be aborted. This may be used by machine-dependent code doing
slow page access to reduce the latency of running a process that has
become runnable while in the middle of doing a slow page zero.
 1.40 02-Aug-2000  thorpej MALLOC() is not to be used for variable-sized allocations.
 1.39 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.38 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.37 09-Jun-2000  soda fix printf format mismatch, when paddr_t becomes (long long) on arc port.
 1.36 29-May-2000  thorpej Change the comment before the vm_page_zero_enable global to indicate
what it will now be used for.
 1.35 26-May-2000  thorpej branches: 1.35.2;
First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.
 1.34 24-Apr-2000  thorpej Changes necessary to implement pre-zero'ing of pages in the idle loop:
- Make page free lists have two actual queues: known-zero pages and
pages with unknown contents.
- Implement uvm_pageidlezero(). This function attempts to zero up to
the target number of pages until the target has been reached (currently
target is `all free pages') or until whichqs becomes non-zero (indicating
that a process is ready to run).
- Define a new hook for the pmap module for pre-zero'ing pages. This is
used to zero the pages using uncached access. This allows us to zero
as many pages as we want without polluting the cache.

In order to use this feature, each platform must add the appropropriate
glue in their idle loop.
 1.33 10-Apr-2000  thorpej Add UVM_PGA_ZERO which instructs uvm_pagealloc{,_strat}() to return a
zero'd, ! PG_CLEAN page, as if it were uvm_pagezero()'d.
 1.32 02-Apr-2000  thorpej Instead of checking vm_physmem[<physseg>].pgs to determine if
uvm_page_init() has completed, add a boolean uvm.page_init_done,
and test against that. Use this same boolean (rather than
pmap_initialized) in pmap_growkernel() to determine if we are
being called via uvm_page_init() to grow the kernel address space.

This fixes a problem on some i386 configurations where pmap_init()
itself was needing to have the kernel page table grown, and since
pmap_initialized was not yet set to TRUE, pmap_growkernel() was
choosing the wrong code path.

Fix tested by Havard Eidnes.
 1.31 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.30 13-Feb-2000  thorpej Allocate the page buckets out of kernel_map, not kmem_map. Saves 16
or so kmem_map pages on a 32MB SPARCstation 2.
 1.29 30-Dec-1999  eeh I should have made uvm_page_physload() take paddr_t's instead of vaddr_t's.
Also, add uvm_coredump32().
 1.28 01-Dec-1999  drochner in uvm_page_physget(), try the vm_physmem[] chunks in the order of their
"free_list" attributes, to save DMA memory
 1.27 30-Nov-1999  thorpej Avoid an integer overflow on systems w/ more than 2G of RAM.
 1.26 24-Nov-1999  drochner add a diagnostic panic to catch illegal memory ranges passed to
uvm_page_physload()
 1.25 12-Sep-1999  chs branches: 1.25.2; 1.25.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.
 1.24 22-Jul-1999  thorpej Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.23 25-May-1999  thorpej Add a comment explaining why using pmap_kenter_pa() is safe here.
 1.22 25-May-1999  thorpej Macro'ize the test for "object is a kernel object".
 1.21 24-May-1999  thorpej - Change uvm_{lock,unlock}_fpageq() to return/take the previous interrupt
level directly, instead of making the caller wrap the calls in
splimp()/splx().
- Add a comment documenting that interrupts that cause memory allocation
must be blocked while the free page queue is locked.

Since interrupts must be blocked while this lock is asserted, tying them
together like this helps to prevent mistakes.
 1.20 20-May-1999  thorpej Make a slight modification of pmap_growkernel() -- it now returns the
end of the mappable kernel virtual address space. Previously, it would
get called more often than necessary, because the caller only new what
was requested.

Also, export uvm_maxkaddr so that uvm_pageboot_alloc() can grow the
kernel pmap if necessary, as well. Note that pmap_growkernel() must
now be able to handle being called before pmap_init().
 1.19 20-May-1999  thorpej If we run out of virtual space in uvm_pageboot_alloc(), fail gracefully
rather than unpredictably.
 1.18 11-Apr-1999  chs add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.17 26-Mar-1999  mycroft branches: 1.17.2;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().
 1.16 25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.15 18-Oct-1998  chs branches: 1.15.2;
shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.
 1.14 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.13 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.12 08-Jul-1998  thorpej branches: 1.12.2;
Add support for multiple memory free lists. There is at least one
default free list, and 0 - N additional free list, in order of descending
priority.

A new page allocation function, uvm_pagealloc_strat(), has been added,
providing three page allocation strategies:

- normal: high -> low priority free list walk, taking the
page off the first free list that has one.

- only: attempt to allocate a page only from the specified free
list, failing if that free list has none available.

- fallback: if `only' fails, fall back on `normal'.

uvm_pagealloc(...) is provided for normal use (and is a synonym for
uvm_pagealloc_strat(..., UVM_PGA_STRAT_NORMAL, 0); the free list argument
is ignored for the `normal' case).

uvm_page_physload() now specified which free list the pages will be
loaded onto. This means that some platforms which have multiple physical
memory segments may define additional vm_physsegs if they wish to break
individual physical segments into differing priorities.

Machine-dependent code must define _at least_ the following constants
in <machine/vmparam.h>:

VM_NFREELIST: the number of free lists the system will have

VM_FREELIST_DEFAULT: the default freelist (should always be 0,
but is defined in machdep code so that it's with all of the
other free list-related constants).

Additional free list names may be defined by machine-dependent code, but
they will only be used by machine-dependent code (e.g. for loading the
vm_physsegs).
 1.11 28-May-1998  chuck unstatic uvm_page_physload so pmap modules can use it too.
as requested by Eduardo E. Horvath
 1.10 05-May-1998  kleink Remove inclusions of syscall (and syscall argument) related header files;
we don't need them here.
 1.9 16-Apr-1998  thorpej Fix small whitespace botch.
 1.8 31-Mar-1998  chuck free correct page in incomplete section of MNN, as pointed
out by Soren S. Jorvang.
 1.7 09-Mar-1998  mrg KNF.
 1.6 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.5 08-Feb-1998  thorpej Round allocations to page size in uvm_pageboot_alloc().
 1.4 07-Feb-1998  mrg restore rcsids
 1.3 07-Feb-1998  chs reserve some pages for the kernel, and some more especially
for the pagedaemon allocating from kmem_object. this should
prevent from the pagedaemon running out of memory and deadlocking.
fix counting of wired pages.
add some debugging code to detect attempts to reference free vm_pages.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.12.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.15.2.4 30-May-1999  chs vm_page's blkno field is gone.
 1.15.2.3 09-Apr-1999  chs init lock for aiodone daemon.
fix printfs for alpha.
 1.15.2.2 25-Feb-1999  chs in uvm_pagealloc_strat(), treat pages being paged out as "free"
when deciding whether we need to wakeup the pagedaemon.
also, clear a page's blkno when allocating it.
 1.15.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.17.2.5 26-Apr-2000  he Pull up revision 1.32 (requested by thorpej):
Use a more reliable method to determine if uvm_page_init() has
completed. This fixes a problem observed on some i386 configs
(typically with lots of memory) where the kernel page table needs
to grow during initialization.
 1.17.2.4 20-Dec-1999  he Pull up revision 1.28 (requested by drochner):
Allow booting of kernels which are larger than 16MB on i386.
 1.17.2.3 18-Jun-1999  perry pullup 1.19->1.20 (thorpej): fix the 1G RAM bug
 1.17.2.2 18-Jun-1999  perry pullup 1.18->1.19 (thorpej)
 1.17.2.1 16-Apr-1999  chs branches: 1.17.2.1.2; 1.17.2.1.4;
pull up 1.17 -> 1.18:
add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.17.2.1.4.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.17.2.1.2.5 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.17.2.1.2.4 31-Jul-1999  chs add uvm_page_unbusy() to simplify dropping PG_BUSY.
 1.17.2.1.2.3 11-Jul-1999  chs make sure pages are allocated on page-aligned offsets.
 1.17.2.1.2.2 21-Jun-1999  thorpej Sync w/ -current.
 1.17.2.1.2.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.25.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.25.2.7 23-Apr-2001  bouyer Sync with HEAD.
 1.25.2.6 12-Mar-2001  bouyer Sync with HEAD.
 1.25.2.5 11-Feb-2001  bouyer Sync with HEAD.
 1.25.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.25.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.25.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.25.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.35.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.50.2.14 11-Nov-2002  nathanw Catch up to -current
 1.50.2.13 18-Oct-2002  nathanw Catch up to -current.
 1.50.2.12 01-Aug-2002  nathanw Catch up to -current.
 1.50.2.11 16-Jul-2002  nathanw pagedaemon_proc really should be a proc, not a LWP.
 1.50.2.10 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.50.2.9 20-Jun-2002  nathanw Catch up to -current.
 1.50.2.8 28-Feb-2002  nathanw Catch up to -current.
 1.50.2.7 08-Jan-2002  nathanw Catch up to -current.
 1.50.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.50.2.5 08-Oct-2001  nathanw Catch up to -current.
 1.50.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.50.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.50.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.50.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.65.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.65.2.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.65.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.65.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.65.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.65.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.65.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.68.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.74.4.3 12-Mar-2002  thorpej Make hashlock an adaptive mutex, and rename it to hash_mutex.
 1.74.4.2 12-Mar-2002  thorpej Make pageqlock an adaptive mutex, and rename it to pageq_mutex.
 1.74.4.1 12-Mar-2002  thorpej Convert the fpageqlock to a spin mutex at IPL_VM and rename it
to fpageq_mutex.
 1.75.4.3 21-Jun-2002  lukem Pull up revision 1.78 (requested by chs in ticket #329):
count aobj pages (most notably kernel stack pages) as anon pages
for memory usage-balancing purposes.
 1.75.4.2 20-Jun-2002  lukem Pull up revision 1.77 (requested by wrstuden in ticket #322):
Fix recent bugs seen on Performa 4400 macppc's by
Makoto Fujiwara <makoto@ki.nu> and Manuel Bouyer <bouyer@netbsd.org>.
Help from Allen Briggs, Jason Thorpe, and Matt Thomas.
We need to call cpu_cache_probe() early in boot (machdep.c).
Add 603 info for completeness, and use NBPG not PAGESIZE, as the
latter relies on uvm being setup (cpu_subr.c).
Let uvm_page_recolor() be called before uvm has been set up; just
note the page coloring value (uvm_page.c).
 1.75.4.1 01-Jun-2002  tv Pull up revision 1.76 (requested by enami in ticket #114):
Add missing pageq lock while uvm_pagefree() is called (either directly
or indirectly). Reviewed by chuq.
 1.75.2.2 15-Jul-2002  gehenna catch up with -current.
 1.75.2.1 30-May-2002  gehenna Catch up with -current.
 1.89.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.89.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.89.2.5 02-Nov-2004  skrll Sync with HEAD.
 1.89.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.89.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.89.2.2 03-Sep-2004  skrll Sync with HEAD
 1.89.2.1 03-Aug-2004  skrll Sync with HEAD
 1.97.2.2 11-Sep-2004  he Pull up revision 1.99 (requested by yamt in ticket #830:
Correct page accounting for anon pages: decrement
uvmexp.anonpages when orphaning an A->K loaned page, and
null out anon.u.an_page as the anon no longer owns the page
in that case. Add a few related assertions. Also correct
a comment.
 1.97.2.1 10-May-2004  tron Pull up revision 1.98 (requested by yamt in ticket #271):
uvm_page_unbusy: add assertions and comments about PG_RELEASED anon pages.
 1.101.6.1 25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.101.4.1 29-Apr-2005  kent sync with -current
 1.106.2.10 24-Mar-2008  yamt sync with head.
 1.106.2.9 17-Mar-2008  yamt sync with head.
 1.106.2.8 27-Feb-2008  yamt sync with head.
 1.106.2.7 21-Jan-2008  yamt sync with head
 1.106.2.6 07-Dec-2007  yamt sync with head
 1.106.2.5 27-Oct-2007  yamt sync with head.
 1.106.2.4 03-Sep-2007  yamt sync with head.
 1.106.2.3 26-Feb-2007  yamt sync with head.
 1.106.2.2 30-Dec-2006  yamt sync with head.
 1.106.2.1 21-Jun-2006  yamt sync with head.
 1.109.6.1 22-Apr-2006  simonb Sync with head.
 1.109.4.1 09-Sep-2006  rpaulo sync with head
 1.109.2.1 18-Feb-2006  yamt sync with head.
 1.111.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.111.4.1 19-Apr-2006  elad oops - *really* sync to head this time.
 1.111.2.4 15-Sep-2006  yamt make UVM_KICK_PDAEMON() a real function and stop including
uvm_pdpolicy.h from uvm.h. this also fixes build of pmap(1).
 1.111.2.3 24-May-2006  yamt sync with head.
 1.111.2.2 12-Mar-2006  yamt - change the way to account read-ahead stats.
- fix UVM_PQFLAGBITS.
 1.111.2.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.112.8.4 12-Jan-2007  ad Sync with head.
 1.112.8.3 11-Jan-2007  ad Checkpoint work in progress.
 1.112.8.2 18-Nov-2006  ad Sync with head.
 1.112.8.1 17-Nov-2006  ad Checkpoint work in progress.
 1.113.2.3 18-Dec-2006  yamt sync with head.
 1.113.2.2 22-Oct-2006  yamt use workqueue for aiodoned.
 1.113.2.1 22-Oct-2006  yamt sync with head
 1.117.2.3 15-Apr-2007  yamt sync with head.
 1.117.2.2 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.117.2.1 17-Feb-2007  yamt - separate context switching and thread scheduling.
- introduce idle lwp.
- change some related MD/MI interfaces and implement i386 version.
 1.119.6.1 11-Jul-2007  mjf Sync with head.
 1.119.4.12 21-Aug-2007  yamt fix some races around pagedaemon and uvm_wait. ok'ed by Andrew Doran.
 1.119.4.11 20-Aug-2007  ad Sync with HEAD.
 1.119.4.10 29-Jul-2007  ad Pad out the hashlocks to reduce cache traffic.
 1.119.4.9 08-Jun-2007  ad Sync with head.
 1.119.4.8 29-Apr-2007  ad Note that the hashlocks must be spinlocks.
 1.119.4.7 28-Apr-2007  ad Split uvm_hashlock into an array of 32 locks.
 1.119.4.6 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.119.4.5 05-Apr-2007  ad - Put a per-LWP lock around swapin / swapout.
- Replace use of lockmgr().
- Minor locking fixes and assertions.
- uvm_map.h no longer pulls in proc.h, etc.
- Use kpause where appropriate.
 1.119.4.4 05-Apr-2007  ad uvm_pagefree: check for already freed pages before checking the locks.
 1.119.4.3 05-Apr-2007  ad Add some lock assertions.
 1.119.4.2 21-Mar-2007  ad idlezero: don't grab the kernel lock.
 1.119.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.122.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.123.10.2 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.123.10.1 21-Jul-2007  ad file uvm_page.c was added on branch matt-mips64 on 2007-07-21 19:21:56 +0000
 1.123.8.1 14-Oct-2007  yamt sync with head.
 1.123.6.3 23-Mar-2008  matt sync with HEAD
 1.123.6.2 09-Jan-2008  matt sync with HEAD
 1.123.6.1 06-Nov-2007  matt sync with HEAD
 1.123.4.2 03-Dec-2007  joerg Sync with HEAD.
 1.123.4.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.125.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.125.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.126.6.2 19-Jan-2008  bouyer Sync with HEAD
 1.126.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.126.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.128.6.7 17-Jan-2009  mjf Sync with HEAD.
 1.128.6.6 28-Sep-2008  mjf Sync with HEAD.
 1.128.6.5 02-Jul-2008  mjf Sync with HEAD.
 1.128.6.4 29-Jun-2008  mjf Sync with HEAD.
 1.128.6.3 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.128.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.128.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.128.2.1 24-Mar-2008  keiichi sync with head.
 1.131.6.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.131.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.131.4.5 09-Oct-2010  yamt sync with head
 1.131.4.4 11-Aug-2010  yamt sync with head.
 1.131.4.3 11-Mar-2010  yamt sync with head
 1.131.4.2 19-Aug-2009  yamt sync with head.
 1.131.4.1 04-May-2009  yamt sync with head.
 1.131.2.2 17-Jun-2008  yamt sync with head.
 1.131.2.1 04-Jun-2008  yamt sync with head
 1.135.2.4 18-Jul-2008  simonb Sync with head.
 1.135.2.3 03-Jul-2008  simonb Sync with head.
 1.135.2.2 30-Jun-2008  matt Update to HEAD.
 1.135.2.1 18-Jun-2008  simonb Sync with head.
 1.140.6.3 02-Mar-2009  snj branches: 1.140.6.3.4; 1.140.6.3.8;
Pull up following revision(s) (requested by drochner in ticket #541):
sys/uvm/uvm_page.c: revision 1.144
oops - missed a case with PMAP_PAGEIDLEZERO if md code aborts the
zeroing process, from Nicolas Joly
 1.140.6.2 02-Mar-2009  snj Pull up following revision(s) (requested by drochner in ticket #541):
sys/uvm/uvm_page.c: revision 1.143
-fix two conditions where PQ_FREE was still/already set while the page
was not anymore/yet on the freelist and uvm_fpageqlock was not held
-clear PQ_FREE while the page is in the works of pageidlezero
This avoids that the DMA memory allocator (pglistalloc) grabs a page
which is not on the freelist, leading to a diagnostic panic (with DEBUG)
or freelist corruption. (mostly on X server activation after a VT
switch or suspend/resume because this can allocate megabytes of AGP
memory)
This might fix PR port-i386/38989 by Alan Barrett (in case this was
a multiprocessor).
 1.140.6.1 27-Dec-2008  snj Pull up following revision(s) (requested by bouyer in ticket #211):
sys/uvm/uvm_km.c: revision 1.103
sys/uvm/uvm_map.c: revision 1.265
sys/uvm/uvm_page.c: revision 1.141
It's easier for kernel reserve pages to be consumed because the pagedaemon
serves as less of a barrier these days. Restrict provision of kernel reserve
pages to kmem and one of these cases:
- doing a NOWAIT allocation
- caller is a realtime thread
- caller is a kernel thread
- explicitly requested, for example by the pmap
 1.140.6.3.8.1 07-Jan-2011  matt Improve panic message for non-power of 2 page size.
 1.140.6.3.4.12 24-Mar-2014  matt Make sure the hint is initialized to NULL.
 1.140.6.3.4.11 15-Feb-2014  matt Adapt to K{,D}ASSERTMSG changes
 1.140.6.3.4.10 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.140.6.3.4.9 16-Feb-2012  matt Track the victims selected by the pagedaemon and what happens to then.
Keep a hint for what page group has the most free pages for a given color.
 1.140.6.3.4.8 14-Feb-2012  matt Add more KASSERTs (more! more! more!).
When returning page to the free pool, make sure to dequeue the pages before
hand or free page queue corruption will happen.
 1.140.6.3.4.7 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.140.6.3.4.6 29-Nov-2011  matt Add a macro to allow a port to control from which freelists "normal" pages
can be allocated.
 1.140.6.3.4.5 03-Jun-2011  matt Restore $NetBSD$
 1.140.6.3.4.4 03-Jun-2011  matt Rework page free lists to be sorted by color first rather than free_list.
Kept per color PGFL_* counter in each page free list.
Minor cleanups.
 1.140.6.3.4.3 27-Jan-2010  nisimura Remove JRT two line comment about cache interference since the
change specifically addresses general VIPT cache issue. Need more
throughout comment cleanup about uvmexp.ncolor intent and significance.
 1.140.6.3.4.2 26-Jan-2010  matt Pass hints to uvm_pagealloc* to get it to use the right page color rather
than guess the right page color.
 1.140.6.3.4.1 12-Sep-2009  matt Add KASSERT(pg) to uvm_pagefree() so that invalid physical address passed
to VM_PHYS_TO_PAGE will be easily caught.
 1.140.4.3 28-Apr-2009  skrll Sync with HEAD.
 1.140.4.2 03-Mar-2009  skrll Sync with HEAD.
 1.140.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.142.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.153.2.69 22-Nov-2010  uebayasi Start merging uvm_page_physload() and uvm_page_physload_device().
 1.153.2.68 21-Nov-2010  uebayasi Rename PGO_ZERO as PGO_HOLE, and s/uvm_page_zeropage/uvm_page_holepage/.
 1.153.2.67 21-Nov-2010  uebayasi uvm_pglistalloc(9) returns 0 on success, not # of pages.
 1.153.2.66 20-Nov-2010  uebayasi Don't set PG_FAKE for device pages.

XXX PG_FAKE should be renamed as PG_UNINITED!
 1.153.2.65 15-Nov-2010  uebayasi Move zero-page into a common place, in the hope that it's shared
for other purposes.

According to Chuck Silvers, zero-page mappings don't need to be
explicitly unmapped in putpages(). Follow that advice.
 1.153.2.64 12-Nov-2010  uebayasi Oops - don't expose unnecessary data.
 1.153.2.63 12-Nov-2010  uebayasi Fix debug code.
 1.153.2.62 11-Nov-2010  uebayasi s/managed device page/device page/
 1.153.2.61 11-Nov-2010  uebayasi Use vm_physseg accessors. Remove confusing comments.
 1.153.2.60 04-Nov-2010  uebayasi Style.
 1.153.2.59 04-Nov-2010  uebayasi Split physical device segment pages from "managed" to "managed
device". Cache that information as a flag PG_DEVICE so that callers
don't need to walk physsegs everytime.

Remove PQ_FIXED, which means that page daemon doesn't need to know
device segment pages at all. But still fault handlers need to know
them.

I think this is what I can do best now.
 1.153.2.58 02-Nov-2010  uebayasi Drop the 'paddr_t avail_start' and 'paddr_t avail_end' arguments
from uvm_page_physload_device(9).

Those two arguments are used by uvm_page_physload(9) to specify a
range of physical memory available for general purpose pages (pages
which are linked to freelists). Totally irrelevant to device
segments.
 1.153.2.57 27-Oct-2010  uebayasi Unconditionally provide device page segment data structures and
functions as suggested by Chuck Silvers.

(Memory and device segments are being merged soon.)
 1.153.2.56 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.153.2.55 27-Aug-2010  uebayasi Make vm_physseg lookup work when VM_PHYSSEG_MAX == 1.
 1.153.2.54 25-Aug-2010  uebayasi Fix DIAGNOSTIC build. Sprinkle some assertions.
 1.153.2.53 11-Aug-2010  uebayasi If both __HAVE_PMAP_PHYSSEG and __HAVE_PMAP_PHYSSEG_INIT is defined,
call per-vm_physseg initialization/finalization hooks.
 1.153.2.52 11-Aug-2010  uebayasi s/vm_physseg_find_direct/vm_physseg_find_device/
 1.153.2.51 27-Jul-2010  uebayasi Use VM_PROT_* instead of PROT_* in UVM.
 1.153.2.50 26-Jul-2010  uebayasi After much consideration, rename bus_space_physload_direct(9) back to
bus_space_physload_device(9).

The latter registers a segment as "device pages". "Device pages" are
managed, but not used for general purpose memory. Most typically XIP
pages.
 1.153.2.49 24-Jul-2010  uebayasi Give "physseg" related functions better names.
 1.153.2.48 24-Jul-2010  uebayasi Remove a useless assertion.
 1.153.2.47 22-Jul-2010  uebayasi Cosmetic.
 1.153.2.46 22-Jul-2010  uebayasi s/PG_XIP/PQ_FIXED/, meaning that the fault handler sees XIP pages as
"fixed", and doesn't pass them to paging activity.

("XIP" is a vnode specific knowledge. It was wrong that the fault
handler had to know such a special thing.)
 1.153.2.45 15-Jul-2010  uebayasi Rename PG_DIRECT to PG_XIP. PG_XIP is marked to XIP vnode pages.
 1.153.2.44 12-Jul-2010  uebayasi Reduce more diff by backing out XIP page specific code. Allow XIP pages
to be loaned.
 1.153.2.43 09-Jul-2010  uebayasi Mark XIP pages as PG_CLEAN and/or PG_BUSY when appropriate. Protect
vnode lock when vm_page::flags is manipulated.
 1.153.2.42 08-Jul-2010  uebayasi Mark XIP pages as PG_RDONLY.
 1.153.2.41 07-Jul-2010  uebayasi Clean up; merge options DIRECT_PAGE into options XIP.
 1.153.2.40 07-Jul-2010  uebayasi To simplify things, revert global vm_page_md hash and allocate struct
vm_page [] for XIP physical segments.
 1.153.2.39 31-May-2010  uebayasi Re-define the definition of "device page"; device pages are pages of
device memory. Pages which don't have vm_page (== can't be used for
generic use), but whose PV are tracked, are called "direct pages" from
now.
 1.153.2.38 31-May-2010  uebayasi Revert partial "phys_addr" removal code. This change is independent of
XIP, and will be done later.
 1.153.2.37 28-May-2010  uebayasi Remove an old XXX comment. Fix a conditional build.
 1.153.2.36 30-Apr-2010  uebayasi Sync with HEAD.
 1.153.2.35 29-Apr-2010  uebayasi "int free_list" (VM_FREELIST_*) is specific to struct vm_page (memory
page). Handle it only in memory physseg parts.

Record device page's properties in struct vm_physseg for future uses.
For example, framebuffers that is capable of some accelarated bus access
(e.g. write-combining) should register its capability through "int
flags".
 1.153.2.34 29-Apr-2010  uebayasi Fix a thinko in Rev. 1.153.2.30.
 1.153.2.33 29-Apr-2010  uebayasi Revert previous; unintended changes mixed.
 1.153.2.32 29-Apr-2010  uebayasi Fix thinko in previous.
 1.153.2.31 29-Apr-2010  uebayasi Fold long lines.
 1.153.2.30 29-Apr-2010  uebayasi FIx an off-by-one in my new code.
 1.153.2.29 28-Apr-2010  uebayasi Initial support of uvm_page_physunload(9) and uvm_page_physunload_device(9).
Note that callers of these functions are responsible to ensure that the
segment is not used.
 1.153.2.28 28-Apr-2010  uebayasi Manage struct vm_physseg as a list, which means that struct vm_physseg
objects don't move when a segment is added / removed.
 1.153.2.27 28-Apr-2010  uebayasi Always use struct vm_physseg *vm_physmem_ptrs[] in MD code.
 1.153.2.26 28-Apr-2010  uebayasi Use struct vm_physseg *vm_physmem_ptrs[] in lookup code paths
(vm_physseg_find()).
 1.153.2.25 28-Apr-2010  uebayasi Use struct vm_physseg *vm_physmem_ptrs[] in initialization code paths.
 1.153.2.24 27-Apr-2010  uebayasi Whitespace.
 1.153.2.23 27-Apr-2010  uebayasi Maintain not only arrays of struct vm_physseg, but also arrays of pointers
to struct vm_physseg. This is need:

- to make the array change dynamically (unload), and

- to make the struct vm_physseg * object to be passed to device drivers as
a cookie of a managed physical segment.
 1.153.2.22 27-Apr-2010  uebayasi Clean up comments.
 1.153.2.21 26-Apr-2010  uebayasi Collect a garbage.
 1.153.2.20 26-Apr-2010  uebayasi Clean up: move memory segment specific code from uvm_page_physload_common()
to uvm_page_physload().
 1.153.2.19 26-Apr-2010  uebayasi Remove the unfinished code to add a memory segment after uvm_page_init().
It doesn't even compile.

(In the future, we should allocate struct vm_page [] on the added memory
segment for NUMA's sake.)
 1.153.2.18 25-Apr-2010  uebayasi Refactor uvm_page_physload_common(). Memory allocation failure here is
critical; panic if it happens.
 1.153.2.17 25-Apr-2010  uebayasi Make uvm_page_physload() return the registered struct vm_physseg *.
 1.153.2.16 28-Feb-2010  uebayasi Don't always enable XIP on this branch to prepare the merge. Fix build
without XIP in places.
 1.153.2.15 23-Feb-2010  uebayasi Put back vm_page::phys_addr for now, because removing it involves some random
parts in the tree. I'll revisit this after merging the branch.
 1.153.2.14 23-Feb-2010  uebayasi Make struct vm_page_md * -> struct vm_page_md * lookup a real function and
hide its internal. Won't cause much performance loss because results are
usually cached by callers.
 1.153.2.13 23-Feb-2010  uebayasi Introduce uvm_page_physload_device(). This registers a physical address
range of a device, similar to uvm_page_physload() for memories. For now,
this is supposed to be called by MD code. We have to consider the design
when we'll manage mmap'able character devices.

Expose paddr_t -> struct vm_page * conversion function for device pages,
uvm_phys_to_vm_page_device(). This will be called by XIP vnode pager.
Because it knows if a given vnode is a device page (and its physical
address base) or not. Don't look up device segments, but directly make a
cookie.
 1.153.2.12 12-Feb-2010  uebayasi Enable the newly added VM_PAGE_TO_MD() only #ifdef __HAVE_VM_PAGE_MD.
Pointed out by mrg@.
 1.153.2.11 10-Feb-2010  uebayasi Fix previous again & use VM_PAGE_TO_MD() where appropriate.
 1.153.2.10 10-Feb-2010  uebayasi Initial MD per-page data (struct vm_page_md) lookup code for XIP'able device
pages. Compile tested only.

Always define uvm_pageisdevice_p(). Always false if kernel is !DEVICE_PAGE.
 1.153.2.9 09-Feb-2010  uebayasi Give new funcs better names.
 1.153.2.8 09-Feb-2010  uebayasi Implement device page struct vm_page * handling.
 1.153.2.7 09-Feb-2010  uebayasi Define vm_physdev / vm_nphysdev, physical address segment data for managed
device pages.
 1.153.2.6 09-Feb-2010  uebayasi vm_nphysseg -> vm_nphysmem
 1.153.2.5 09-Feb-2010  uebayasi Merge vm_physseg lookup routines.
 1.153.2.4 09-Feb-2010  uebayasi Kill vm_page::phys_addr.
 1.153.2.3 08-Feb-2010  uebayasi Abstract vm_physseg_find() to handle struct vm_page *.
 1.153.2.2 08-Feb-2010  uebayasi Make vm_physseg lookup into a real function.
 1.153.2.1 08-Feb-2010  uebayasi Make vm_physseg::lastpg exclusive end.
 1.154.2.6 12-Jun-2011  rmind sync with head
 1.154.2.5 21-Apr-2011  rmind sync with head
 1.154.2.4 05-Mar-2011  rmind sync with head
 1.154.2.3 30-May-2010  rmind sync with head
 1.154.2.2 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.154.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.169.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.169.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.172.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.178.6.1 18-Feb-2012  mrg merge to -current.
 1.178.2.16 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.178.2.15 22-Apr-2013  yamt fix an assertion
 1.178.2.14 17-Apr-2012  yamt sync with head
 1.178.2.13 17-Feb-2012  yamt byebye PG_HOLE as it turned out to be unnecessary.
 1.178.2.12 04-Jan-2012  yamt O->A loan related statistics fixes.
 1.178.2.11 04-Jan-2012  yamt make assertions simpler
 1.178.2.10 04-Jan-2012  yamt remove a debug printf
 1.178.2.9 26-Dec-2011  yamt - use O->A loan to serve read(2). based on a patch from Chuck Silvers
- associated O->A loan fixes.
 1.178.2.8 30-Nov-2011  yamt make lfs another pager specific flag so that it won't be affected by
an nfs hack in genfs.
 1.178.2.7 20-Nov-2011  yamt - fix page loaning XXX make O->A loaning further
- add some statistics
 1.178.2.6 18-Nov-2011  yamt - use mutex obj for pageable object
- add a function to wait for a mutex obj being available
- replace some "livelock" kpauses with it
 1.178.2.5 13-Nov-2011  yamt cache UVM_OBJ_IS_VNODE in pqflags
 1.178.2.4 12-Nov-2011  yamt redo the page clean/dirty/unknown accounting separately for file and
anonymous pages
 1.178.2.3 11-Nov-2011  yamt - track the number of clean/dirty/unknown pages in the system.
- g/c PG_MARKER
 1.178.2.2 06-Nov-2011  yamt remove pg->listq and uobj->memq
 1.178.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.182.4.1 18-May-2014  rmind sync with head
 1.182.2.2 03-Dec-2017  jdolecek update from HEAD
 1.182.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.183.2.2 10-Aug-2014  tls Rebase.
 1.183.2.1 07-Apr-2014  tls Be a little more clear and consistent about harvesting entropy from devices:

1) deprecate RND_FLAG_NO_ESTIMATE

2) define RND_FLAG_COLLECT_TIME, RND_FLAG_COLLECT_VALUE

3) define RND_FLAG_ESTIMATE_TIME, RND_FLAG_ESTIMATE_VALUE

4) define RND_FLAG_DEFAULT: RND_FLAG_COLLECT_TIME|
RND_FLAG_COLLECT_VALUE|RND_FLAG_ESTIMATE_TIME

5) Make entropy harvesting from environmental sensors a little more generic
and remove it from individual sensor drivers.

6) Remove individual open-coded delta-estimators for values from a few
places in the tree (uvm, environmental drivers).

7) 0 -> RND_FLAG_DEFAULT, actually gather entropy from various drivers
that had stubbed out code, other minor cleanups.
 1.186.2.3 28-Aug-2017  skrll Sync with HEAD
 1.186.2.2 05-Feb-2017  skrll Sync with HEAD
 1.186.2.1 06-Jun-2015  skrll Sync with HEAD
 1.187.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.187.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.191.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.193.2.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.195.2.2 21-May-2018  pgoyette Sync with HEAD
 1.195.2.1 02-May-2018  pgoyette Synch with HEAD
 1.198.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.198.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.198.2.1 10-Jun-2019  christos Sync with HEAD
 1.199.4.1 06-Jul-2021  martin Pull up following revision(s) - all via patch -
(requested by riastradh in ticket #1317):

sys/uvm/uvm_page.c: revision 1.248
sys/uvm/uvm_anon.c: revision 1.80
sys/rump/librump/rumpvfs/vm_vfs.c: revision 1.40
sys/rump/librump/rumpvfs/vm_vfs.c: revision 1.41
sys/rump/librump/rumpkern/vm.c: revision 1.191
sys/uvm/uvm_pager.c: revision 1.130
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.71
tests/rump/rumpkern/t_vm.c: revision 1.5
tests/rump/rumpkern/t_vm.c: revision 1.6
sys/rump/librump/rumpvfs/vm_vfs.c: revision 1.39

Move the handling of PG_PAGEOUT from uvm_aio_aiodone_pages() to
uvm_page_unbusy() so that all callers of uvm_page_unbusy() don't need to
handle this flag separately. Split out the pages part of uvm_aio_aiodone()
into uvm_aio_aiodone_pages() in rump just like in the real kernel.

In ZFS functions that can fail to copy data between the ARC and VM pages,
use uvm_aio_aiodone_pages() rather than uvm_page_unbusy() so that we can
handle these "I/O" errors. Fixes PR 55702.

fix an incorrect assertion in the previous commit.

Handle PG_PAGEOUT in uvm_anon_release() too.

Commit the ZFS file that I forgot in this previous commit:

Move the handling of PG_PAGEOUT from uvm_aio_aiodone_pages() to
uvm_page_unbusy() so that all callers of uvm_page_unbusy() don't need to
handle this flag separately. Split out the pages part of uvm_aio_aiodone()
into uvm_aio_aiodone_pages() in rump just like in the real kernel.

In ZFS functions that can fail to copy data between the ARC and VM pages,
use uvm_aio_aiodone_pages() rather than uvm_page_unbusy() so that we can
handle these "I/O" errors. Fixes PR 55702.
update the rump copy of uvm_page_unbusy() to match the real version,
in particular handle PG_PAGEOUT. fixes a few atf tests.
the busypage test is buggy, expect it to fail.

make rump's uvm_aio_aiodone_pages() look more like the kernel version.
fixes some more rumpy assertions.

for the busypage test, replace atf_tc_expect_fail() with atf_tc_skip()
because atf apparently has no way to expect a test program to crash.
fixes PR 55945.
 1.221.2.3 29-Feb-2020  ad Sync with head.
 1.221.2.2 25-Jan-2020  ad Sync with head.
 1.221.2.1 17-Jan-2020  ad Sync with head.
 1.249.2.1 03-Jan-2021  thorpej Sync w/ HEAD.
 1.109 20-Dec-2020  skrll Support __HAVE_PMAP_PV_TRACK in sys/uvm/pmap based pmaps (aka common pmap)
 1.108 20-Dec-2020  skrll Remove VM_MD_TO_PAGE that was accidentally committed in 1.106. It's going
to be readded with the code that uses it
 1.107 07-Oct-2020  chs branches: 1.107.2;
Add a new, more aggressive allocator for uvm_pglistalloc() to allocate
contiguous physical pages, and try this new allocator if the existing
one fails. The existing contig allocator only tries to allocate pages
that are already free, which works fine shortly after boot but rarely
works after the system has been up for a while. The new allocator uses
the pagedaemon to evict pages from memory in the hope that this will
free up a range of pages that satisfies the constraits of the request.
This should help with things like plugging in a USB device, which often
fails for some USB controllers because they can't get contigous memory.
 1.106 20-Sep-2020  skrll G/C uvm_pagezerocheck
 1.105 14-Jun-2020  ad Remove PG_ZERO. It worked brilliantly on x86 machines from the mid-90s but
having spent an age experimenting with it over the last 6 months on various
machines and with different use cases it's always either break-even or a
slight net loss for me.
 1.104 24-May-2020  ad Add uvm_pagewanted_p(): return true if someone is waiting on the page and
assert caller has correct lock to observe that.
 1.103 17-May-2020  ad Start trying to reduce cache misses on vm_page during fault processing.

- Make PGO_LOCKED getpages imply PGO_NOBUSY and remove the latter. Mark
pages busy only when there's actually I/O to do.

- When doing COW on a uvm_object, don't mess with neighbouring pages. In
all likelyhood they're already entered.

- Don't mess with neighbouring VAs that have existing mappings as replacing
those mappings with same can be quite costly.

- Don't enqueue pages for neighbour faults unless not enqueued already, and
don't activate centre pages unless uvmpdpol says its useful.

Also:

- Make PGO_LOCKED getpages on UAOs work more like vnodes: do gang lookup in
the radix tree, and don't allocate new pages.

- Fix many assertion failures around faults/loans with tmpfs.
 1.102 17-Mar-2020  ad Tweak the March 14th change to make page waits interlocked by pg->interlock.
Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude
possible bugs.
 1.101 16-Mar-2020  rin Include <sys/rwlock.h> for krwlock_t required by uvm_pagewait().
 1.100 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.99 06-Mar-2020  riastradh Include "opt_uvm_page_trkown.h" for UVM_PAGE_TRKOWN.
 1.98 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.97 21-Jan-2020  ad uvmpdpol_pageactive(): the change to not re-activate recently activated
pages worked great with uvm_pageqlock, but it doesn't buy anything any more,
because now the busy pages are likely in a per-CPU queue somewhere waiting
to be processed, and changing the intent on those queued pages costs next
to nothing. Remove this and get back all the bits in pg->pqflags.
 1.96 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.95 10-Jan-2020  ad UVM_PAGE_TREE_PENALTY isn't used any more.
 1.94 09-Jan-2020  ad Use __SHIFTIN()/__SHIFTOUT(). Suggested by riastradh@.
 1.93 31-Dec-2019  ad branches: 1.93.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.92 31-Dec-2019  ad struct vm_page: cluster fields most heavily used by the page allocator and
uvmpdpol at the start of the structure, so that while under global lock we
need only touch one cache line for each vm_page. There is still the problem
of vm_page not being aligned, but this seems to drop lock wait time for
(a modified) uvmpdpol and the allocator by 20-30% in a quick test.
 1.91 31-Dec-2019  ad Rename uvm_page_locked_p() -> uvm_page_owner_locked_p()
 1.90 27-Dec-2019  ad vm_page: Now that listq is gone, give the pagedaemon its own private
TAILQ_ENTRY, so that update of page replacement state can be made
asynchronous/lazy. No functional change.
 1.89 27-Dec-2019  ad Redo the page allocator to perform better, especially on multi-core and
multi-socket systems. Proposed on tech-kern. While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
 1.88 21-Dec-2019  ad - Rename VM_PGCOLOR_BUCKET() to VM_PGCOLOR(). I want to reuse "bucket" for
something else soon and TBH it matches what this macro does better.

- Add inlines to set/get locator values in the unused lower bits of
pg->phys_addr. Begin by using it to cache the freelist index, because
computing it is expensive and that shows up during profiling. Discussed
on tech-kern.
 1.87 15-Dec-2019  ad Merge from yamt-pagecache:

- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
 1.86 14-Dec-2019  ad Merge from yamt-pagecache: use radixtree for page lookup.

rbtree page lookup was introduced during the NetBSD 5.0 development cycle to
bypass lock contention problems with the (then) global page hash, and was a
temporary solution to allow us to make progress.radixtree is the intended
replacement.

Ok yamt@.
 1.85 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.84 07-Jan-2019  jdolecek branches: 1.84.4;
add sysctl to easily set ubc_direct

PR kern/53124
 1.83 19-May-2018  jdolecek branches: 1.83.2;
add experimental new function uvm_direct_process(), to allow of read/writes
of contents of uvm pages without mapping them into kernel, using
direct map or moral equivalent; pmaps supporting the interface need
to provide pmap_direct_process() and define PMAP_DIRECT

implement the new interface for amd64; I hear alpha and mips might be relatively
easy to add too, but I lack the knowledge

part of resolution for PR kern/53124
 1.82 14-Nov-2017  mrg branches: 1.82.2;
remove duplicate prototype.
 1.81 23-Dec-2016  cherry "Make NetBSD great again!"

Introduce uvm_hotplug(9) to the kernel.

Many thanks, in no particular order to:

TNF, for funding the project.

Chuck Silvers - for multiple API reviews and feedback.
Nick Hudson - for testing on multiple architectures and bugfix patches.
Everyone who helped with boot testing.

KeK (http://www.kek.org.in) for hosting the primary developers.
 1.80 23-Mar-2015  riastradh branches: 1.80.2;
Call these `identities', not `life states'.
 1.79 21-Mar-2015  riastradh No, PQ_ANON is set only if owned by anon, not if loaned to anon.
 1.78 21-Mar-2015  riastradh Address O->A loan case in comments, pointed out by chs@.
 1.77 21-Mar-2015  riastradh Elaborate on locking scheme and vm_page states.
 1.76 25-Oct-2013  martin branches: 1.76.6;
Optimize out VM_PHYSMEM_PTR_SWAP on architectures that have VM_PHYSSEG_MAX = 1
(hard to address two different array entries there w/o invoking undefined
behaviour, and newer compilers complain about it).
 1.75 05-May-2012  rmind branches: 1.75.2; 1.75.4;
Describe PG_ flags (for struct vm_page). Reviewed by yamt@.
 1.74 28-Jan-2012  rmind Improve description on struct vm_page and explain locking a little bit more.
 1.73 12-Jun-2011  rmind branches: 1.73.2; 1.73.6;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.72 19-May-2011  yamt branches: 1.72.2;
g/c unused function prototypes
 1.71 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.
 1.70 18-Jan-2011  matt branches: 1.70.2;
Improve the efficiency of searching for a contiguous set of free pages.
 1.69 26-Nov-2010  uebayasi branches: 1.69.2;
Put back VM_PAGE_TO_MD(); pointed out by skrll@, thanks.
 1.68 25-Nov-2010  uebayasi Revert vm_physseg allocation changes. A report says that it causes
panics when used with mplayer in heavy load.
 1.67 14-Nov-2010  uebayasi Be a little more friendly to dynamic physical segment registration.

Maintain an array of pointer to struct vm_physseg, instead of struct
array. So that VM subsystem can take its pointer safely. Pointer
to this struct will replace raw paddr_t usage in the future.

Dynamic removal is not supported yet.

Only MD data structure changes, no kernel bump needed.

Tested on i386, amd64, powerpc/ibm40x, arm11.
 1.66 12-Nov-2010  uebayasi Put VM_PAGE_TO_MD() definition in one place. No functional changes.
 1.65 12-Nov-2010  uebayasi Abstraction fix; move physical address -> per-page metadata (struct
vm_page *) "reverse" lookup code from uvm_page.h to uvm_page.c, to
help migration to not do that.

Likewise move per-page metadata (struct vm_page *) -> physical
address "forward" conversion code into *.c too. This is called
only low-layer VM and MD code.
 1.64 12-Nov-2010  uebayasi Abstraction fix; move physical address -> physical segment "reverse"
lookup code from uvm_page.h to uvm_page.c.

This code is used by some pmaps to lookup per-page state (PV) from
per-segment metadata (struct vm_physseg). This is not needed if
UVM looks up physical segment once in fault handler, then directly
passes it to pmap. This change helps transition to that model.

The only users of vm_physseg_find() are pmap_motorola.c and
powerpc/ibm4xx/pmap.c.

Tested By: Compiling and running powerpc/ibm4xx/pmap.c
(evbppc/conf/OPENBLOCKS266)
 1.63 10-Nov-2010  uebayasi Use more VM_PHYSMEM_*() accessors. No functional changes.
 1.62 10-Nov-2010  uebayasi Prepare vm_physmem[] -> (*vm_physmem)[] migration, so that physical
segments can be changed at run-time. Pointers are easier to update.
 1.61 25-Sep-2010  matt Rename rb.h to rbtree.h, as it is more appropriate (c.f. ptree.h). Also
helps find code that hasn't been updated to use the new rbtree API.
 1.60 29-Jul-2010  hannken Add vm page flag PG_MARKER and use it to tag dummy marker pages
in genfs_do_putpages() and uao_put().
Use 'v_uobj.uo_npages' to check for an empty memq.
Put some assertions where these marker pages may not appear.

Ok: YAMAMOTO Takashi <yamt@netbsd.org>
 1.59 06-Feb-2010  uebayasi branches: 1.59.2; 1.59.4;
__inline -> inline
 1.58 06-Feb-2010  uebayasi Make vm_physseg lookup routines take the target vm_physseg. This is for the
coming "managed" device segments.
 1.57 18-Aug-2009  thorpej Add a real API for testing if a page is a managed page, and adjust callers
to stop relying on vm_physseg_find() for this purpose.
 1.56 16-Jan-2009  yamt - g/c stale function prototypes.
- rename UVM_PAGE_HASH_PENALTY to UVM_PAGE_TREE_PENALTY.
 1.55 04-Jun-2008  ad branches: 1.55.6; 1.55.14; 1.55.18;
Replace the global vm_page hash with a per vm_object rbtree.
Proposed on tech-kern@.
 1.54 04-Jun-2008  ad - vm_page: put listq, pageq into a union alongside a LIST_ENTRY, so we can
use both types of list.

- Make page coloring and idle zero state per-CPU.

- Maintain per-CPU page freelists. When freeing, put pages onto the local
CPU's lists and the global lists. When allocating, prefer to take pages
from the local CPU. If none are available take from the global list as
done now. Proposed on tech-kern@.
 1.53 02-Jun-2008  ad uvm_pageidlezero:

- Use high and low water marks to try and reduce power consumption.
- Do trylock on uvm_fpageqlock, and bail if we can't get it.
- Only run on one CPU at a time.
 1.52 27-Feb-2008  matt branches: 1.52.2; 1.52.4; 1.52.6;
Convert two inlines from old-style-definitions to ansi.
 1.51 27-Feb-2008  ad Minor corrections to comments.
 1.50 02-Jan-2008  ad branches: 1.50.2; 1.50.6;
Merge vmlocking2 to head.
 1.49 21-Jul-2007  ad branches: 1.49.6; 1.49.12; 1.49.14; 1.49.18; 1.49.22;
Merge unobtrusive locking changes from the vmlocking branch.
 1.48 14-Apr-2007  perseant branches: 1.48.2;
Track lwp as well as proc owner with UVM_PAGE_TRKOWN
 1.47 21-Feb-2007  thorpej branches: 1.47.4; 1.47.6;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.46 15-Sep-2006  yamt branches: 1.46.6;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.45 06-Apr-2006  uebayasi branches: 1.45.8;
Update comment to match reality (vm_physmemseg -> vm_physseg).
 1.44 16-Feb-2006  perry branches: 1.44.2; 1.44.4; 1.44.6;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.43 11-Feb-2006  yamt remove the following options. no objections on tech-kern@.

UVM_PAGER_INLINE
UVM_AMAP_INLINE
UVM_PAGE_INLINE
UVM_MAP_INLINE
 1.42 24-Dec-2005  perry branches: 1.42.2; 1.42.4; 1.42.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.41 29-Nov-2005  yamt read-ahead statistics.
 1.40 04-Jun-2005  chs branches: 1.40.2; 1.40.8;
adapt to const changes.
 1.39 07-Oct-2004  yamt g/c stale declarations of page queues.
 1.38 12-May-2004  yamt add assertions.
 1.37 24-Mar-2004  junyoung Nuke __P().
 1.36 10-Nov-2003  rearnsha In vm_phsyseg_find, use u_int for start, len and try when doing a
binary search. Avoids the need for signed division by 2. Approved
by thorpej.
 1.35 03-Nov-2003  yamt add a DEBUG check if freed PG_ZERO pages are really zero-filled.
 1.34 10-May-2003  thorpej branches: 1.34.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.
 1.33 08-May-2003  thorpej Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).
 1.32 08-Nov-2002  enami s/than than/than/.
 1.31 15-Sep-2001  chs branches: 1.31.6;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.30 25-Jul-2001  thorpej branches: 1.30.2;
Back out previous -- christos needs to update his lint(1).
 1.29 25-Jul-2001  christos fix non-portable bitmap warning.
 1.28 22-Jul-2001  wiz seperate -> separate
 1.27 28-Jun-2001  thorpej branches: 1.27.2;
Rather than using u_shorts, use u_ints and bitfields in the vm_page. This
provides us more flexibility with pageq-locked fields, and clarifies the
locking semantics for platforms which cannot address shorts.

From Ross Harvey.
 1.26 25-May-2001  chs remove trailing whitespace.
 1.25 16-May-2001  ross Expand on the locking notes comment with a XXX warning about u_short fields.
 1.24 02-May-2001  thorpej Support dynamic sizing of the page color bins. We also support
dynamically re-coloring pages; as machine-dependent code discovers
the size of the system's caches, it may call uvm_page_recolor() with
the new number of colors to use. If the new mumber of colors is
smaller (or equal to) the current number of colors, then uvm_page_recolor()
is a no-op.

The system defaults to one bucket if machine-dependent code does not
initialize uvmexp.ncolors before uvm_page_init() is called.

Note that the number of color bins should be initialized to something
reasonable as early as possible -- for many early memory allocations,
we live with the consequences of the page choice for the lifetime of
the boot.
 1.23 01-May-2001  thorpej Garbage-collect a comment that has not been applicable since Mach.
 1.22 01-May-2001  thorpej Per discussion w/ chuck and chuck, restructure the md page stuff
to use a structure called "vm_page_md", and use __HAVE_VM_PAGE_MD
and __HAVE_PMAP_PHYSSEG.
 1.21 29-Apr-2001  thorpej Add a VM_MDPAGE_MEMBERS macro that defines pmap-specific data for
each vm_page structure. Add a VM_MDPAGE_INIT() macro to init this
data when pages are initialized by UVM. These macros are mandatory,
but ports may #define them to nothing if they are not needed/used.

This deprecates struct pmap_physseg. As a transitional measure,
allow a port to #define PMAP_PHYSSEG so that it can continue to
use it until its pmap is converted to use VM_MDPAGE_MEMBERS.

Use all this stuff to eliminate a lot of extra work in the Alpha
pmap module (it's smaller and faster now). Changes to other pmap
modules will follow.
 1.20 29-Apr-2001  thorpej Implement page coloring, using a round-robin bucket selection
algorithm (Solaris calls this "Bin Hopping").

This implementation currently relies on MD code to define a
constant defining the number of buckets. This will change
reasonably soon (MD code will be able to dynamically size
the bucket array).
 1.19 28-Dec-2000  chs branches: 1.19.2;
remove some more leftovers from Mach.
 1.18 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.17 03-Oct-2000  mrg clean up a comment.
 1.16 27-Jun-2000  mrg more vm header file changes:

<vm/vm_extern.h> merged into <uvm/uvm_extern.h>
<vm/vm_page.h> merged into <uvm/uvm_page.h>
<vm/pmap.h> has become <uvm/uvm_pmap.h>

this leaves just <vm/vm.h> in NetBSD.
 1.15 24-Apr-2000  thorpej Changes necessary to implement pre-zero'ing of pages in the idle loop:
- Make page free lists have two actual queues: known-zero pages and
pages with unknown contents.
- Implement uvm_pageidlezero(). This function attempts to zero up to
the target number of pages until the target has been reached (currently
target is `all free pages') or until whichqs becomes non-zero (indicating
that a process is ready to run).
- Define a new hook for the pmap module for pre-zero'ing pages. This is
used to zero the pages using uncached access. This allows us to zero
as many pages as we want without polluting the cache.

In order to use this feature, each platform must add the appropropriate
glue in their idle loop.
 1.14 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.13 21-Jun-1999  thorpej branches: 1.13.2;
Protect prototypes, certain macros, and inlines from userland.
 1.12 24-May-1999  thorpej - Change uvm_{lock,unlock}_fpageq() to return/take the previous interrupt
level directly, instead of making the caller wrap the calls in
splimp()/splx().
- Add a comment documenting that interrupts that cause memory allocation
must be blocked while the free page queue is locked.

Since interrupts must be blocked while this lock is asserted, tying them
together like this helps to prevent mistakes.
 1.11 25-Mar-1999  mrg branches: 1.11.4;
remove now >1 year old pre-release message.
 1.10 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.9 08-Jul-1998  thorpej branches: 1.9.2;
Add support for multiple memory free lists. There is at least one
default free list, and 0 - N additional free list, in order of descending
priority.

A new page allocation function, uvm_pagealloc_strat(), has been added,
providing three page allocation strategies:

- normal: high -> low priority free list walk, taking the
page off the first free list that has one.

- only: attempt to allocate a page only from the specified free
list, failing if that free list has none available.

- fallback: if `only' fails, fall back on `normal'.

uvm_pagealloc(...) is provided for normal use (and is a synonym for
uvm_pagealloc_strat(..., UVM_PGA_STRAT_NORMAL, 0); the free list argument
is ignored for the `normal' case).

uvm_page_physload() now specified which free list the pages will be
loaded onto. This means that some platforms which have multiple physical
memory segments may define additional vm_physsegs if they wish to break
individual physical segments into differing priorities.

Machine-dependent code must define _at least_ the following constants
in <machine/vmparam.h>:

VM_NFREELIST: the number of free lists the system will have

VM_FREELIST_DEFAULT: the default freelist (should always be 0,
but is defined in machdep code so that it's with all of the
other free list-related constants).

Additional free list names may be defined by machine-dependent code, but
they will only be used by machine-dependent code (e.g. for loading the
vm_physsegs).
 1.8 28-May-1998  chuck unstatic uvm_page_physload so pmap modules can use it too.
as requested by Eduardo E. Horvath
 1.7 22-Mar-1998  chuck remove tmpwire arg from uvm_pagewire() -- it isn't needed anymore.
noted by chuck s.
 1.6 09-Mar-1998  mrg KNF.
 1.5 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.9.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.11.4.4 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.11.4.3 31-Jul-1999  chs add uvm_page_unbusy() to simplify dropping PG_BUSY.
 1.11.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.11.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.13.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.13.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.13.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.19.2.4 11-Nov-2002  nathanw Catch up to -current
 1.19.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.19.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.19.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.27.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.27.2.1 03-Aug-2001  lukem update to -current
 1.30.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.31.6.2 12-Mar-2002  thorpej Make pageqlock an adaptive mutex, and rename it to pageq_mutex.
 1.31.6.1 12-Mar-2002  thorpej Convert the fpageqlock to a spin mutex at IPL_VM and rename it
to fpageq_mutex.
 1.34.2.6 11-Dec-2005  christos Sync with head.
 1.34.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.34.2.4 19-Oct-2004  skrll Sync with HEAD
 1.34.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.34.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.34.2.1 03-Aug-2004  skrll Sync with HEAD
 1.40.8.1 29-Nov-2005  yamt sync with head.
 1.40.2.6 17-Mar-2008  yamt sync with head.
 1.40.2.5 21-Jan-2008  yamt sync with head
 1.40.2.4 03-Sep-2007  yamt sync with head.
 1.40.2.3 26-Feb-2007  yamt sync with head.
 1.40.2.2 30-Dec-2006  yamt sync with head.
 1.40.2.1 21-Jun-2006  yamt sync with head.
 1.42.6.1 22-Apr-2006  simonb Sync with head.
 1.42.4.1 09-Sep-2006  rpaulo sync with head
 1.42.2.1 18-Feb-2006  yamt sync with head.
 1.44.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.44.4.1 19-Apr-2006  elad oops - *really* sync to head this time.
 1.44.2.3 11-Apr-2006  yamt sync with head
 1.44.2.2 12-Mar-2006  yamt - change the way to account read-ahead stats.
- fix UVM_PQFLAGBITS.
 1.44.2.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.45.8.1 18-Nov-2006  ad Sync with head.
 1.46.6.2 15-Apr-2007  yamt sync with head.
 1.46.6.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.47.6.1 11-Jul-2007  mjf Sync with head.
 1.47.4.2 08-Jun-2007  ad Sync with head.
 1.47.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.48.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.49.22.2 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.49.22.1 21-Jul-2007  ad file uvm_page.h was added on branch matt-mips64 on 2007-07-21 19:21:56 +0000
 1.49.18.1 02-Jan-2008  bouyer Sync with HEAD
 1.49.14.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.49.12.1 18-Feb-2008  mjf Sync with HEAD.
 1.49.6.2 23-Mar-2008  matt sync with HEAD
 1.49.6.1 09-Jan-2008  matt sync with HEAD
 1.50.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.50.6.3 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.50.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.50.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.50.2.1 24-Mar-2008  keiichi sync with head.
 1.52.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.52.4.5 09-Oct-2010  yamt sync with head
 1.52.4.4 11-Aug-2010  yamt sync with head.
 1.52.4.3 11-Mar-2010  yamt sync with head
 1.52.4.2 19-Aug-2009  yamt sync with head.
 1.52.4.1 04-May-2009  yamt sync with head.
 1.52.2.2 17-Jun-2008  yamt sync with head.
 1.52.2.1 04-Jun-2008  yamt sync with head
 1.55.18.1 14-Oct-2011  matt Add VM_PHYSMEM_PTR and VM_PAGE_TO_MD macros from -current.
 1.55.14.9 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.55.14.8 16-Feb-2012  matt Track the victims selected by the pagedaemon and what happens to then.
Keep a hint for what page group has the most free pages for a given color.
 1.55.14.7 13-Feb-2012  matt Use separate pending and paging tailq entries.
Add a queue check routine to validate the queues aren't corrupt.
 1.55.14.6 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.55.14.5 03-Jun-2011  matt Restore $NetBSD$
 1.55.14.4 03-Jun-2011  matt Rework page free lists to be sorted by color first rather than free_list.
Kept per color PGFL_* counter in each page free list.
Minor cleanups.
 1.55.14.3 25-May-2011  matt Make uvm_map recognize UVM_FLAG_COLORMATCH which tells uvm_map that the
'align' argument specifies the starting color of the KVA range to be returned.

When calling uvm_km_alloc with UVM_KMF_VAONLY, also specify the starting
color of the kva range returned (UMV_KMF_COLORMATCH) and pass those to
uvm_map.

In uvm_pglistalloc, make sure the pages being returned have sequentially
advancing colors (so they can be mapped in a contiguous address range).
Add a few missing UVM_FLAG_COLORMATCH flags to uvm_pagealloc calls.

Make the socket and pipe loan color-safe.

Make the mips pmap enforce strict page color (color(VA) == color(PA)).
 1.55.14.2 29-Apr-2011  matt Add macros from current (VM_PAGE_TO_MD, VM_PHYSMEM_PTR, VM_PHYSMEM_PTR_SWAP)
 1.55.14.1 23-Jan-2010  matt Add a start_hint to vm_physseg so when allocating pages, we can skip
forward over pages that are probably still allocated.
 1.55.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.59.4.3 31-May-2011  rmind sync with head
 1.59.4.2 05-Mar-2011  rmind sync with head
 1.59.4.1 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.59.2.37 21-Nov-2010  uebayasi Rename PGO_ZERO as PGO_HOLE, and s/uvm_page_zeropage/uvm_page_holepage/.
 1.59.2.36 15-Nov-2010  uebayasi Move zero-page into a common place, in the hope that it's shared
for other purposes.

According to Chuck Silvers, zero-page mappings don't need to be
explicitly unmapped in putpages(). Follow that advice.
 1.59.2.35 12-Nov-2010  uebayasi Move MD member in struct vm_physseg to the tail, in case this struct
can be shared among architectures with only difference of the MD
part.
 1.59.2.34 10-Nov-2010  uebayasi Fix thinko; make vm_physseg ptr swap really work.
 1.59.2.33 04-Nov-2010  uebayasi Split physical device segment pages from "managed" to "managed
device". Cache that information as a flag PG_DEVICE so that callers
don't need to walk physsegs everytime.

Remove PQ_FIXED, which means that page daemon doesn't need to know
device segment pages at all. But still fault handlers need to know
them.

I think this is what I can do best now.
 1.59.2.32 27-Oct-2010  uebayasi Unconditionally provide device page segment data structures and
functions as suggested by Chuck Silvers.

(Memory and device segments are being merged soon.)
 1.59.2.31 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.59.2.30 17-Aug-2010  uebayasi Sync with HEAD.
 1.59.2.29 17-Aug-2010  uebayasi Collect a garbage.
 1.59.2.28 11-Aug-2010  uebayasi s/vm_physseg_find_direct/vm_physseg_find_device/
 1.59.2.27 22-Jul-2010  uebayasi s/PG_XIP/PQ_FIXED/, meaning that the fault handler sees XIP pages as
"fixed", and doesn't pass them to paging activity.

("XIP" is a vnode specific knowledge. It was wrong that the fault
handler had to know such a special thing.)
 1.59.2.26 15-Jul-2010  uebayasi Rename PG_DIRECT to PG_XIP. PG_XIP is marked to XIP vnode pages.
 1.59.2.25 08-Jul-2010  uebayasi One more missing s/DIRECT_PAGE/XIP/.
 1.59.2.24 08-Jul-2010  uebayasi Whitespace.
 1.59.2.23 07-Jul-2010  uebayasi To simplify things, revert global vm_page_md hash and allocate struct
vm_page [] for XIP physical segments.
 1.59.2.22 31-May-2010  uebayasi Re-define the definition of "device page"; device pages are pages of
device memory. Pages which don't have vm_page (== can't be used for
generic use), but whose PV are tracked, are called "direct pages" from
now.
 1.59.2.21 31-May-2010  uebayasi Revert partial "phys_addr" removal code. This change is independent of
XIP, and will be done later.
 1.59.2.20 29-Apr-2010  uebayasi "int free_list" (VM_FREELIST_*) is specific to struct vm_page (memory
page). Handle it only in memory physseg parts.

Record device page's properties in struct vm_physseg for future uses.
For example, framebuffers that is capable of some accelarated bus access
(e.g. write-combining) should register its capability through "int
flags".
 1.59.2.19 28-Apr-2010  uebayasi Manage struct vm_physseg as a list, which means that struct vm_physseg
objects don't move when a segment is added / removed.
 1.59.2.18 28-Apr-2010  uebayasi Always use struct vm_physseg *vm_physmem_ptrs[] in MD code.
 1.59.2.17 27-Apr-2010  uebayasi Maintain not only arrays of struct vm_physseg, but also arrays of pointers
to struct vm_physseg. This is need:

- to make the array change dynamically (unload), and

- to make the struct vm_physseg * object to be passed to device drivers as
a cookie of a managed physical segment.
 1.59.2.16 27-Apr-2010  uebayasi Sort.
 1.59.2.15 23-Feb-2010  uebayasi Put back vm_page::phys_addr for now, because removing it involves some random
parts in the tree. I'll revisit this after merging the branch.
 1.59.2.14 23-Feb-2010  uebayasi Make struct vm_page_md * -> struct vm_page_md * lookup a real function and
hide its internal. Won't cause much performance loss because results are
usually cached by callers.
 1.59.2.13 23-Feb-2010  uebayasi Introduce uvm_page_physload_device(). This registers a physical address
range of a device, similar to uvm_page_physload() for memories. For now,
this is supposed to be called by MD code. We have to consider the design
when we'll manage mmap'able character devices.

Expose paddr_t -> struct vm_page * conversion function for device pages,
uvm_phys_to_vm_page_device(). This will be called by XIP vnode pager.
Because it knows if a given vnode is a device page (and its physical
address base) or not. Don't look up device segments, but directly make a
cookie.
 1.59.2.12 12-Feb-2010  uebayasi Typo.
 1.59.2.11 12-Feb-2010  uebayasi Enable the newly added VM_PAGE_TO_MD() only #ifdef __HAVE_VM_PAGE_MD.
Pointed out by mrg@.
 1.59.2.10 10-Feb-2010  uebayasi Fix previous again & use VM_PAGE_TO_MD() where appropriate.
 1.59.2.9 10-Feb-2010  uebayasi Oops fix a typo. (My lapdog's k/b is dying.)
 1.59.2.8 10-Feb-2010  uebayasi Introduce VM_PAGE_TO_MD(); lookup vm_page_md from a given vm_page.
 1.59.2.7 10-Feb-2010  uebayasi Initial MD per-page data (struct vm_page_md) lookup code for XIP'able device
pages. Compile tested only.

Always define uvm_pageisdevice_p(). Always false if kernel is !DEVICE_PAGE.
 1.59.2.6 09-Feb-2010  uebayasi Implement device page struct vm_page * handling.
 1.59.2.5 09-Feb-2010  uebayasi Define vm_physdev / vm_nphysdev, physical address segment data for managed
device pages.
 1.59.2.4 09-Feb-2010  uebayasi vm_nphysseg -> vm_nphysmem
 1.59.2.3 09-Feb-2010  uebayasi Kill vm_page::phys_addr.
 1.59.2.2 08-Feb-2010  uebayasi Make vm_physseg lookup into a real function.
 1.59.2.1 08-Feb-2010  uebayasi Make vm_physseg::lastpg exclusive end.
 1.69.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.70.2.1 08-Feb-2011  bouyer Sync with HEAD
 1.72.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.73.6.2 02-Jun-2012  mrg sync to latest -current.
 1.73.6.1 18-Feb-2012  mrg merge to -current.
 1.73.2.12 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.73.2.11 23-May-2012  yamt sync with head.
 1.73.2.10 17-Apr-2012  yamt sync with head
 1.73.2.9 17-Feb-2012  yamt byebye PG_HOLE as it turned out to be unnecessary.
 1.73.2.8 30-Nov-2011  yamt make lfs another pager specific flag so that it won't be affected by
an nfs hack in genfs.
 1.73.2.7 20-Nov-2011  yamt - fix page loaning XXX make O->A loaning further
- add some statistics
 1.73.2.6 18-Nov-2011  yamt - use mutex obj for pageable object
- add a function to wait for a mutex obj being available
- replace some "livelock" kpauses with it
 1.73.2.5 14-Nov-2011  yamt remove now unused UVM_PAGE_TREE_PENALTY
 1.73.2.4 13-Nov-2011  yamt cache UVM_OBJ_IS_VNODE in pqflags
 1.73.2.3 11-Nov-2011  yamt - track the number of clean/dirty/unknown pages in the system.
- g/c PG_MARKER
 1.73.2.2 06-Nov-2011  yamt remove pg->listq and uobj->memq
 1.73.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.75.4.1 18-May-2014  rmind sync with head
 1.75.2.2 03-Dec-2017  jdolecek update from HEAD
 1.75.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.76.6.2 05-Feb-2017  skrll Sync with HEAD
 1.76.6.1 06-Apr-2015  skrll Sync with HEAD
 1.80.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.82.2.2 18-Jan-2019  pgoyette Synch with HEAD
 1.82.2.1 21-May-2018  pgoyette Sync with HEAD
 1.83.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.83.2.1 10-Jun-2019  christos Sync with HEAD
 1.84.4.1 13-May-2020  martin Pull up following revision(s) (requested by chs in ticket #906):

sys/uvm/uvm_page.h: revision 1.99

Include "opt_uvm_page_trkown.h" for UVM_PAGE_TRKOWN.
 1.93.2.3 29-Feb-2020  ad Sync with head.
 1.93.2.2 25-Jan-2020  ad Sync with head.
 1.93.2.1 17-Jan-2020  ad Sync with head.
 1.107.2.1 03-Jan-2021  thorpej Sync w/ HEAD.
 1.9 26-May-2020  ad uvm_page_array_fill(): return ENOENT in all cases when nothing's left.
 1.8 25-May-2020  ad Make previous work as intended. Bad programmer.
 1.7 25-May-2020  ad Minor correction to previous.
 1.6 25-May-2020  ad - Alter the convention for uvm_page_array slightly, so the basic search
parameters can't change part way through a search: move the "uobj" and
"flags" arguments over to uvm_page_array_init() and store those with the
array.

- With that, detect when it's not possible to find any more pages in the
tree with the given search parameters, and avoid repeated tree lookups if
the caller loops over uvm_page_array_fill_and_peek().
 1.5 17-Mar-2020  ad branches: 1.5.2;
Fix a comment.
 1.4 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.3 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.2 15-Dec-2019  ad branches: 1.2.2;
Merge from yamt-pagecache:

- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
 1.1 02-Nov-2011  yamt branches: 1.1.2;
file uvm_page_array.c was initially added on branch yamt-pagecache.
 1.1.2.7 23-Jan-2013  yamt tweak assertions
 1.1.2.6 01-Aug-2012  yamt - fix integrity sync.
putpages for integrity sync (fsync, msync with MS_SYNC, etc) should not
skip pages being written back by other threads.

- adapt to radix tree tag api changes.
 1.1.2.5 18-Apr-2012  yamt fix DEBUG code
 1.1.2.4 18-Jan-2012  yamt - bug fixes
- minor optimizations
- assertions
- comments
 1.1.2.3 26-Nov-2011  yamt - uvm_page_array_fill: add some more parameters
- uvn_findpages: use gang-lookup
- genfs_putpages: re-enable backward clustering
- mechanical changes after the recent radixtree.h api changes
 1.1.2.2 06-Nov-2011  yamt add a convenient routine for common usage
 1.1.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.2.2.2 29-Feb-2020  ad Sync with head.
 1.2.2.1 17-Jan-2020  ad Sync with head.
 1.5.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.5.2.1 17-Mar-2020  martin file uvm_page_array.c was added on branch phil-wifi on 2020-04-08 14:09:04 +0000
 1.3 25-May-2020  ad - Alter the convention for uvm_page_array slightly, so the basic search
parameters can't change part way through a search: move the "uobj" and
"flags" arguments over to uvm_page_array_init() and store those with the
array.

- With that, detect when it's not possible to find any more pages in the
tree with the given search parameters, and avoid repeated tree lookups if
the caller loops over uvm_page_array_fill_and_peek().
 1.2 15-Dec-2019  ad branches: 1.2.6;
Merge from yamt-pagecache:

- do gang lookup of pages using radixtree.
- remove now unused uvm_object::uo_memq and vm_page::listq.queue.
 1.1 02-Nov-2011  yamt branches: 1.1.2;
file uvm_page_array.h was initially added on branch yamt-pagecache.
 1.1.2.6 01-Aug-2012  yamt - fix integrity sync.
putpages for integrity sync (fsync, msync with MS_SYNC, etc) should not
skip pages being written back by other threads.

- adapt to radix tree tag api changes.
 1.1.2.5 18-Apr-2012  yamt comment
 1.1.2.4 26-Nov-2011  yamt - uvm_page_array_fill: add some more parameters
- uvn_findpages: use gang-lookup
- genfs_putpages: re-enable backward clustering
- mechanical changes after the recent radixtree.h api changes
 1.1.2.3 14-Nov-2011  yamt comment
 1.1.2.2 06-Nov-2011  yamt add a convenient routine for common usage
 1.1.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.2.6.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.2.6.1 15-Dec-2019  martin file uvm_page_array.h was added on branch phil-wifi on 2020-04-08 14:09:04 +0000
 1.27 11-Feb-2006  yamt remove the following options. no objections on tech-kern@.

UVM_PAGER_INLINE
UVM_AMAP_INLINE
UVM_PAGE_INLINE
UVM_MAP_INLINE
 1.26 11-Dec-2005  christos branches: 1.26.2; 1.26.4; 1.26.6;
merge ktrace-lwp.
 1.25 28-Jun-2005  thorpej branches: 1.25.2;
Clean up the cpp macro used to say "we're compiling this specific C file".
 1.24 28-Jun-2005  thorpej Add missing PAGE_INLINE to uvm_pagelookup()
 1.23 27-Jun-2005  thorpej Use ANSI function decls.
 1.22 12-May-2004  yamt add assertions.
 1.21 01-Dec-2002  matt branches: 1.21.6;
Reorder things so that with multiple inclusion protection that optional
definitions are outside the protection checks.
 1.20 15-Sep-2001  chs branches: 1.20.6;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.19 27-Jun-2001  thorpej branches: 1.19.2; 1.19.4;
Since a page can be on only one of ACTIVE or INACTIVE queues at
any given time, turn two consecutive if statements into an if-else-if
construct.
 1.18 25-May-2001  chs remove trailing whitespace.
 1.17 22-May-2001  ross Merge the swap-backed and object-backed inactive lists.
 1.16 28-Jan-2001  thorpej branches: 1.16.2;
Page scanner improvements, behavior is actually a bit more like
Mach VM's now. Specific changes:
- Pages now need not have all of their mappings removed before being
put on the inactive list. They only need to have the "referenced"
attribute cleared. This makes putting pages onto the inactive list
much more efficient. In order to eliminate redundant clearings of
"refrenced", callers of uvm_pagedeactivate() must now do this
themselves.
- When checking the "modified" attribute for a page (for clearing
PG_CLEAN), make sure to only do it if PG_CLEAN is currently set on
the page (saves a potentially expensive pmap operation).
- When scanning the inactive list, if a page is referenced, reactivate
it (this part was actually added in uvm_pdaemon.c,v 1.27). This
now works properly now that pages on the inactive list are allowed to
have mappings.
- When scanning the inactive list and considering a page for freeing,
remove all mappings, and then check the "modified" attribute if the
page is marked PG_CLEAN.
- When scanning the active list, if the page was referenced since its
last sweep by the scanner, don't deactivate it. (This part was
actually added in uvm_pdaemon.c,v 1.28.)

These changes greatly improve interactive performance during
moderate to high memory and I/O load.
 1.15 14-Jan-2001  thorpej splimp() -> splvm()
 1.14 27-Nov-2000  chs use queue.h macros and KASSERT().
 1.13 08-May-2000  thorpej __predict_false() DIAGNOSTIC error checks.
 1.12 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.11 12-Sep-1999  chs branches: 1.11.2;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.
 1.10 24-May-1999  thorpej - Change uvm_{lock,unlock}_fpageq() to return/take the previous interrupt
level directly, instead of making the caller wrap the calls in
splimp()/splx().
- Add a comment documenting that interrupts that cause memory allocation
must be blocked while the free page queue is locked.

Since interrupts must be blocked while this lock is asserted, tying them
together like this helps to prevent mistakes.
 1.9 25-Mar-1999  mrg branches: 1.9.4;
remove now >1 year old pre-release message.
 1.8 13-Aug-1998  eeh branches: 1.8.2;
Merge paddr_t changes into the main branch.
 1.7 08-Jul-1998  thorpej branches: 1.7.2;
Add support for multiple memory free lists. There is at least one
default free list, and 0 - N additional free list, in order of descending
priority.

A new page allocation function, uvm_pagealloc_strat(), has been added,
providing three page allocation strategies:

- normal: high -> low priority free list walk, taking the
page off the first free list that has one.

- only: attempt to allocate a page only from the specified free
list, failing if that free list has none available.

- fallback: if `only' fails, fall back on `normal'.

uvm_pagealloc(...) is provided for normal use (and is a synonym for
uvm_pagealloc_strat(..., UVM_PGA_STRAT_NORMAL, 0); the free list argument
is ignored for the `normal' case).

uvm_page_physload() now specified which free list the pages will be
loaded onto. This means that some platforms which have multiple physical
memory segments may define additional vm_physsegs if they wish to break
individual physical segments into differing priorities.

Machine-dependent code must define _at least_ the following constants
in <machine/vmparam.h>:

VM_NFREELIST: the number of free lists the system will have

VM_FREELIST_DEFAULT: the default freelist (should always be 0,
but is defined in machdep code so that it's with all of the
other free list-related constants).

Additional free list names may be defined by machine-dependent code, but
they will only be used by machine-dependent code (e.g. for loading the
vm_physsegs).
 1.6 22-Mar-1998  chuck remove tmpwire arg from uvm_pagewire() -- it isn't needed anymore.
noted by chuck s.
 1.5 09-Mar-1998  mrg KNF.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.7.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.8.2.1 25-Feb-1999  chs whitespace nits.
 1.9.4.3 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.9.4.2 21-Jun-1999  thorpej Sync w/ -current.
 1.9.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.11.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.11.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.11.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.11.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.16.2.4 11-Dec-2002  thorpej Sync with HEAD.
 1.16.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.16.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.16.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.19.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.19.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.20.6.2 12-Mar-2002  thorpej Make hashlock an adaptive mutex, and rename it to hash_mutex.
 1.20.6.1 12-Mar-2002  thorpej Convert the fpageqlock to a spin mutex at IPL_VM and rename it
to fpageq_mutex.
 1.21.6.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.21.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.21.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.21.6.1 03-Aug-2004  skrll Sync with HEAD
 1.25.2.1 21-Jun-2006  yamt sync with head.
 1.26.6.1 22-Apr-2006  simonb Sync with head.
 1.26.4.1 09-Sep-2006  rpaulo sync with head
 1.26.2.1 18-Feb-2006  yamt sync with head.
 1.2 18-Aug-2009  thorpej Back-out accidental check-in.
 1.1 18-Aug-2009  thorpej Move uvm_page-related DDB hooks into uvm_page.c.
 1.6 14-Aug-2020  chs centralize calls from UVM to radixtree into a few functions.
in those functions, assert that the object lock is held in
the correct mode.
 1.5 15-May-2020  ad uvm_pagemarkdirty(): no need to set radix tree tag unless page is currently
marked clean.
 1.4 14-Mar-2020  ad branches: 1.4.2;
Make uvm_pagemarkdirty() responsible for putting vnodes onto the syncer
work list. Proposed on tech-kern@.
 1.3 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.2 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.1 02-Nov-2011  yamt branches: 1.1.2; 1.1.26;
file uvm_page_status.c was initially added on branch yamt-pagecache.
 1.1.26.2 29-Feb-2020  ad Sync with head.
 1.1.26.1 17-Jan-2020  ad Sync with head.
 1.1.2.8 01-Aug-2012  yamt - fix integrity sync.
putpages for integrity sync (fsync, msync with MS_SYNC, etc) should not
skip pages being written back by other threads.

- adapt to radix tree tag api changes.
 1.1.2.7 24-Jan-2012  yamt comments
 1.1.2.6 18-Jan-2012  yamt - bug fixes
- minor optimizations
- assertions
- comments
 1.1.2.5 20-Nov-2011  yamt - fix page loaning XXX make O->A loaning further
- add some statistics
 1.1.2.4 13-Nov-2011  yamt cache UVM_OBJ_IS_VNODE in pqflags
 1.1.2.3 12-Nov-2011  yamt redo the page clean/dirty/unknown accounting separately for file and
anonymous pages
 1.1.2.2 11-Nov-2011  yamt - track the number of clean/dirty/unknown pages in the system.
- g/c PG_MARKER
 1.1.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.4.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.4.2.1 14-Mar-2020  martin file uvm_page_status.c was added on branch phil-wifi on 2020-04-08 14:09:04 +0000
 1.131 15-Mar-2024  andvar Fix !VMSWAP build:
Added __unused for few local variables, which are used in VMSWAP block only.
Adjust !VMSWAP uvm_swap_stats() definition to make it build with compat code.
Copied "int (*uvm_swap_stats50)(...)" definition from uvm_swap to uvm_swapstub
to avoid missing uvm_swap_stats50 reference on linking.

Fixes INSTALL_CPMBR1400, INSTALL_ZYXELKX evbmips kernel configs as a result.

Reviewed by simon and phone in IRC (thanks).
 1.130 18-Oct-2020  chs Move the handling of PG_PAGEOUT from uvm_aio_aiodone_pages() to
uvm_page_unbusy() so that all callers of uvm_page_unbusy() don't need to
handle this flag separately. Split out the pages part of uvm_aio_aiodone()
into uvm_aio_aiodone_pages() in rump just like in the real kernel.
In ZFS functions that can fail to copy data between the ARC and VM pages,
use uvm_aio_aiodone_pages() rather than uvm_page_unbusy() so that we can
handle these "I/O" errors. Fixes PR 55702.
 1.129 14-Aug-2020  chs centralize calls from UVM to radixtree into a few functions.
in those functions, assert that the object lock is held in
the correct mode.
 1.128 09-Jul-2020  skrll Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
 1.127 08-Jul-2020  skrll Trailing whitespace
 1.126 25-Jun-2020  jdolecek use maximum-size fixed size array instead of variable-length array
in uvm_aio_aiodone() so that the stack usage can be determined and
checked in compile time; this is not called recursively not
particularly deep in call stack, so there is no need to save every
last drop of stack space here
 1.125 19-Apr-2020  ad uvm_aio_aiodone_pages(): only call uvm_pageout_done() if work was done for
the page daemon.
 1.124 07-Apr-2020  ad branches: 1.124.2;
For single page I/O, use direct mapping if available.
 1.123 24-Feb-2020  rin 0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.
 1.122 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.121 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.120 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.119 31-Dec-2019  ad branches: 1.119.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.118 27-Dec-2019  ad PR kern/48044: panic: kernel diagnostic assertion "uvmexp.swpgonly + npages <= uvmexp.swpginuse" failed

swpgonly is updated asynchronously with regard to swap use. We can't assert
this condition with confidence in the post-5.0 world, at least not without
broader changes. swpgonly's ultimate use is of a heuristic nature so this
is no problem at all.
 1.117 21-Dec-2019  ad - Rename VM_PGCOLOR_BUCKET() to VM_PGCOLOR(). I want to reuse "bucket" for
something else soon and TBH it matches what this macro does better.

- Add inlines to set/get locator values in the unused lower bits of
pg->phys_addr. Begin by using it to cache the freelist index, because
computing it is expensive and that shows up during profiling. Discussed
on tech-kern.
 1.116 14-Dec-2019  ad The uvmexp.pdpending change was incorrect - revert for now.
 1.115 14-Dec-2019  ad Adjust pdpending in uvm_pageout_start() and uvm_pageout_done() to avoid
the value going temporarily negative.
 1.114 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.113 01-Dec-2019  uwe Add missing #include <sys/atomic.h>
 1.112 01-Dec-2019  ad - Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.
 1.111 28-Oct-2017  pgoyette branches: 1.111.4; 1.111.8;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.110 01-Mar-2014  christos branches: 1.110.22;
only check that npages fit, if we are going to add npages to swpgonly.
 1.109 25-Oct-2013  martin Mark diagnostic-only variables
 1.108 27-Jan-2012  para branches: 1.108.6; 1.108.10;
extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.107 11-Oct-2011  yamt branches: 1.107.2; 1.107.6;
comment
 1.106 06-Oct-2011  uebayasi Correct pagermap emergva allocation. From yamt@.

Tested by building i386 kernel with DTRACE defined which died 100%.
 1.105 28-Sep-2011  matt Reallocate emergency pager va when ncolors is increased. (modication of
patch from mrg).
 1.104 01-Sep-2011  matt Forward some UVM from matt-nb5-mips64. Add UVM_KMF_COLORMATCH flag.
When uvm_map gets passed UVM_FLAG_COLORMATCH, the align argument contains
the color of the starting address to be allocated (0..colormask).
When uvm_km_alloc is passed UVM_KMF_COLORMATCH (which can only be used with
UVM_KMF_VAONLY), the align argument contain the color of the starting address
to be allocated.
Change uvm_pagermapin to use this. When mapping user pages in the kernel,
if colormatch is used with the color of the starting user page then the kernel
mapping will be congruent with the existing user mappings.
 1.103 23-Aug-2011  oki make compile without VMSWAP. no functional change.
 1.102 18-Aug-2011  yamt uvm_aio_aiodone_pages: check disposed anon correctly.
 1.101 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.100 23-Apr-2011  rmind branches: 1.100.2;
Replace "malloc" in comments, remove unnecessary header inclusions.
 1.99 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.98 22-Jun-2010  rmind branches: 1.98.2; 1.98.4;
Keep the lock around pmap_update() where required. While fixing this
in ubc_fault(), rework logic to "remember" the last object of page and
reduce locking overhead, since in common case pages belong to one and
the same UVM object (but not always, therefore add a comment).

Unlocks before pmap_update(), on removal of mappings, might cause TLB
coherency issues, since on architectures like x86 and mips64 invalidation
IPIs are deferred to pmap_update(). Hence, VA space might be globally
visible before IPIs are sent or while they are still in-flight.

OK ad@.
 1.97 07-Nov-2009  cegger branches: 1.97.2; 1.97.4;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.96 05-Aug-2009  pooka kill uvm_aio_biodone1(). only user was lfs and that uses nestiobuf now.
 1.95 30-Mar-2009  yamt g/c uvm_aiobuf_pool.
 1.94 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.93 16-Nov-2008  pooka branches: 1.93.4;
more <sys/buf.h> police
 1.92 17-Apr-2008  simonb branches: 1.92.4; 1.92.10; 1.92.12; 1.92.14; 1.92.18;
Set up uvmhist in uvm_aio_aiodone_pages().
 1.91 29-Feb-2008  yamt uvm_swap_io: if pagedaemon, don't wait for iobuf.
 1.90 02-Jan-2008  ad branches: 1.90.2; 1.90.6;
Merge vmlocking2 to head.
 1.89 01-Dec-2007  yamt branches: 1.89.2; 1.89.6;
constify pagerops.
 1.88 01-Dec-2007  yamt uvm_pager_init: use __arraycount.
 1.87 25-Oct-2007  yamt branches: 1.87.2;
defparam PAGER_MAP_SIZE.
 1.86 01-Sep-2007  pooka branches: 1.86.4;
Make bioops a pointer and point it to the softdeps struct in softdep
init. Decouples "options SOFTDEP" from the main kernel and ffs code.
 1.85 29-Jul-2007  ad branches: 1.85.4; 1.85.6; 1.85.8;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.84 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.83 12-Jul-2007  rmind branches: 1.83.2;
Implementation of per-CPU work-queues support for workqueue(9) interface.
WQ_PERCPU flag for workqueue and additional argument for workqueue_enqueue()
to assign a CPU might be used. Notes:
- For now, the list is used for workqueue_queue, which is non-optimal,
and will be changed with array, where index would be CPU ID.
- The data structures should be changed to be cache-friendly.

Reviewed by: <yamt>, <tech-kern>
 1.82 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.81 22-Feb-2007  thorpej branches: 1.81.4; 1.81.6;
TRUE -> true, FALSE -> false
 1.80 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.79 21-Dec-2006  yamt branches: 1.79.2;
merge yamt-splraiseipl branch.

- finish implementing splraiseipl (and makeiplcookie).
http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html
- complete workqueue(9) and fix its ipl problem, which is reported
to cause audio skipping.
- fix netbt (at least compilation problems) for some ports.
- fix PR/33218.
 1.78 15-Sep-2006  yamt branches: 1.78.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.77 13-Apr-2006  christos branches: 1.77.8;
Change previous to KASSERT per yamt's request.
 1.76 13-Apr-2006  christos Coverity CID 835: Check before dereferencing pg->uanon.
 1.75 11-Apr-2006  yamt uvm_pagermapin: nowait allocation for pagedaemon.
 1.74 11-Feb-2006  yamt branches: 1.74.2; 1.74.4; 1.74.6;
remove the following options. no objections on tech-kern@.

UVM_PAGER_INLINE
UVM_AMAP_INLINE
UVM_PAGE_INLINE
UVM_MAP_INLINE
 1.73 04-Jan-2006  yamt branches: 1.73.2; 1.73.4;
- add simple functions to allocate/free a buffer for i/o.
- make bufpool static.
 1.72 29-Nov-2005  yamt branches: 1.72.2;
read-ahead statistics.
 1.71 13-Sep-2005  yamt branches: 1.71.6;
wrap swap related code by #ifdef VMSWAP. always #define VMSWAP for now.
 1.70 31-Jul-2005  yamt revert "defflag VMSWAP" changes for now.
there seems to be far more people who don't want to edit
their kernel config files than i thought.
 1.69 30-Jul-2005  yamt defflag VMSWAP.
 1.68 28-Jun-2005  thorpej branches: 1.68.2;
Clean up the cpp macro used to say "we're compiling this specific C file".
 1.67 27-Jun-2005  thorpej Use ANSI function decls.
 1.66 01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.65 01-Jan-2005  yamt branches: 1.65.2; 1.65.4;
for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.
 1.64 03-Oct-2004  enami Count obj pages freed by pagedaemon.
 1.63 05-May-2004  yamt fix a amap_wirerange deadlock problem by re-introducing
PG_RELEASED for anon pages. PR/23171 from Christian Limpach.
for details, see discussion filed in the PR database.

uvm_anon_release: a new function to free anon-owned PG_RELEASED page.
uvm_anfree: we can't wait for the page here because the caller might hold
amap lock. instead, just mark the page as PG_RELEASED.
who unbusy the page should check the PG_RELEASED.
uvm_aio_aiodone: uvm_anon_release() instead of uvm_page_unbusy()
if appropriate.
uvmfault_anonget: check PG_RELEASED.
 1.62 01-Sep-2003  pk branches: 1.62.2;
Can't rely on side-effects in KASSERT expressions which was pointed out to
me by YAMAMOTO Takashi.
 1.61 28-Aug-2003  pk When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.
 1.60 23-Apr-2003  tls branches: 1.60.2;
Correct use of MAXBSIZE where MAXPHYS was intended. This is a necessary
first step towards per-device MAXPHYS, and has the beneficial side effect
of allowing clustering to MAXPHYS even on systems that need to run with
a reduced MAXBSIZE to get more metadata buffers.
 1.59 09-Nov-2002  thorpej Fix signed/unsigned comparison warnings.
 1.58 01-Oct-2002  chs uao_find_swslot()'s second argument is in units of pages, not bytes.
spotted by Doug Donsbach.
 1.57 15-May-2002  matt branches: 1.57.4;
When core dumping a process, don't dump maps backed up by the device pager.
(move the pagerops externs to uvm_object.h and out the C files).
 1.56 09-May-2002  enami - In genfs_putpages(), no need to restrict the cluster within the given
region.
- In uvm_aio_aiodone(), remove assertions no longer true.
 1.55 31-Dec-2001  chs branches: 1.55.4;
fix locking for loaning. in general we should be looking at the page's
uobject and uanon pointers rather than at the PQ_ANON flag to determine
which lock to hold, since PQ_ANON can be clear even when the anon's lock
is the one which we should hold (if the page was loaned from an object
and then freed by the object).
 1.54 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.53 06-Nov-2001  chs several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.
 1.52 06-Nov-2001  simonb Remove some variables that are set but never used.
 1.51 15-Oct-2001  chs branches: 1.51.2;
fix an uninitialized-variable problem in an error case.
pointed out by Simon Burge.
 1.50 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.49 10-Sep-2001  chris Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.
 1.48 23-Jun-2001  chs branches: 1.48.2; 1.48.4;
clean up the transient error case in uvm_pager_put().
 1.47 02-Jun-2001  chs replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.46 26-May-2001  chs replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.
 1.45 25-May-2001  chs remove trailing whitespace.
 1.44 24-Apr-2001  thorpej Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.
 1.43 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.42 10-Mar-2001  chs eliminate the VM_PAGER_* error codes in favor of the traditional E* codes.
the mapping is:

VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART

for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
 1.41 18-Feb-2001  chs branches: 1.41.2;
in uvm_aio_aiodone(), don't mark the page(s) clean if the pageout
failed because we failed to acquire some resource needed to initiate
the pageout (such as failing to lock an indirect buffer) rather than
a hard i/o error. in this case we just want to reactivate the page(s)
so that we'll try to write them again later.

while I'm here, clean up some DIAGNOSTIC code.
 1.40 04-Feb-2001  mrg add a KASSERT(pp) in the uvm_pagermapin() loop.
 1.39 28-Jan-2001  thorpej Page scanner improvements, behavior is actually a bit more like
Mach VM's now. Specific changes:
- Pages now need not have all of their mappings removed before being
put on the inactive list. They only need to have the "referenced"
attribute cleared. This makes putting pages onto the inactive list
much more efficient. In order to eliminate redundant clearings of
"refrenced", callers of uvm_pagedeactivate() must now do this
themselves.
- When checking the "modified" attribute for a page (for clearing
PG_CLEAN), make sure to only do it if PG_CLEAN is currently set on
the page (saves a potentially expensive pmap operation).
- When scanning the inactive list, if a page is referenced, reactivate
it (this part was actually added in uvm_pdaemon.c,v 1.27). This
now works properly now that pages on the inactive list are allowed to
have mappings.
- When scanning the inactive list and considering a page for freeing,
remove all mappings, and then check the "modified" attribute if the
page is marked PG_CLEAN.
- When scanning the active list, if the page was referenced since its
last sweep by the scanner, don't deactivate it. (This part was
actually added in uvm_pdaemon.c,v 1.28.)

These changes greatly improve interactive performance during
moderate to high memory and I/O load.
 1.38 09-Dec-2000  chs in uvm_pagermapin(), for now, don't pass the flag to pmap_enter()
which presets the page modified bit if the page is already initialized.
we don't actually want to modify such pages.
 1.37 01-Dec-2000  chs make sure that pages are on an paging queue before unlocking them.
 1.36 27-Nov-2000  chs allow building without SOFTDEP by adding the pageiodone hook to bio_ops.
 1.35 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.34 24-Nov-2000  chs increase PAGER_MAP_SIZE to 16MB and move it to uvm_pager.h
since the alpha and mips pmaps use it.
 1.33 13-Sep-2000  thorpej Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.
 1.32 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.31 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.30 20-May-2000  thorpej Remove VM_PROT_EXECUTE from the permissions used to map the page
for pager I/O -- it is not needed, and including it leads to
unnecessary I-cache flushes.
 1.29 19-May-2000  thorpej Tell uvm_pagermapin() the direction of the I/O so that it can map
with only the protection that it needs.
 1.28 03-Apr-2000  chs remove uvm_shareprot(). no longer needed since the demise of share maps.
 1.27 30-Mar-2000  simonb Delete redundant decl of aobj_pager - it's in <uvm/uvm_aobj.h>.
 1.26 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.25 11-Jan-2000  chs add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.24 13-Nov-1999  thorpej Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.
 1.23 12-Sep-1999  chs branches: 1.23.2; 1.23.4; 1.23.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.
 1.22 22-Jul-1999  thorpej Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.21 08-Jul-1999  thorpej Teeny bit of style policing.
 1.20 26-May-1999  thorpej Change the vm_map's "entries_pageable" member to a r/o flags member, which
has PAGEABLE and INTRSAFE flags. PAGEABLE now really means "pageable",
not "allocate vm_map_entry's from non-static pool", so update all map
creations to reflect that. INTRSAFE maps are maps that are used in
interrupt context (e.g. kmem_map, mb_map), and thus use the static
map entry pool (XXX as does kernel_map, for now). This will eventually
change now these maps are locked, as well.
 1.19 26-May-1999  thorpej In uvm_pagermapin(), pass VM_PROT_READ|VM_PROT_WRITE as access_type, to
ensure we don't take mod/ref emulation faults in an interrupt context
(e.g. during the i/o operation). This is safe because:
- For a pageout operation, the page is already known to be
modified, and the pagedaemon will pmap_clear_modify() after
the pageout has completed.
- For a pagein operation, pagers must already pmap_clear_modify()
after the pagein operation is complete, because the i/o may have
been done with e.g. programmed i/o.
XXX It would be nice to know the i/o direction so that we can call
XXX pmap_enter() with only the protection and access_type necessary.
 1.18 24-May-1999  thorpej Remove a comment in uvm_pager_dropcluster() about PMAP_NEW and mod/ref
attributes for the page; it no longer applies, since we don't use
pmap_kenter_pgs() anymore.
 1.17 24-May-1999  thorpej Don't use pmap_kenter_pgs() for entering pager_map mappings. The pages
are still owned by the object which is paging, and so the test for a kernel
object in uvm_unmap_remove() will cause pmap_remove() to be used instead
of pmap_kremove().

This was a MAJOR source of pmap_remove() vs pmap_kremove() inconsistency
(which caused the busted kernel pmap statistics, and a cause of much
locking hair on MP systems).
 1.16 26-Mar-1999  mycroft branches: 1.16.4;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().
 1.15 25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.14 22-Jan-1999  chs fix a precedence problem in uvm_mk_pcluster() which prevented
clustering of vnode pageouts. this probably makes no difference
since most apps don't write via the pagecache anyway... yet.
 1.13 04-Nov-1998  chs branches: 1.13.2;
be consistent with locking of amaps and anons when freeing them.
 1.12 18-Oct-1998  chs shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.
 1.11 11-Oct-1998  chuck remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks
 1.10 31-Aug-1998  thorpej Make sure the aobj_pager gets initialized!
 1.9 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.8 05-May-1998  kleink branches: 1.8.2;
Remove inclusions of syscall (and syscall argument) related header files;
we don't need them here.
 1.7 22-Mar-1998  chuck fix released pg bugs detected by Chuck S.:
- release the correct page (ppsp[lcv], not pg)
- don't access the page's fields after we have released it
- in the uvm_objct case: move on to the next page if we've released
[should have been merged in on 1998/02/12, but we somehow missed it]
 1.6 09-Mar-1998  mrg KNF.
 1.5 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 08-Feb-1998  thorpej Allow callers of uvm_km_suballoc() to specify where the base of the
submap _must_ begin, by adding a "fixed" boolean argument.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.8.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.13.2.5 30-May-1999  chs in uvm_mk_pcluster(), remove the vm_page-blkno-based clustering
(and just disable clustering totally for the moment).

in uvm_pager_put(), remove hack from before where all swap pageouts
were made synchronous, it's not needed anymore.

in uvm_pager_dropcluster(), also clear the PG_FAKE vm_page flag.
a page being dropcluster'd must always either contain valid data
or else be marked PG_RELEASE.

in uvm_aio_aiodone(), mark pages PG_RELEASE when we get a read error.
 1.13.2.4 29-Apr-1999  chs remove some useless code.
 1.13.2.3 09-Apr-1999  chs move uvmexp.paging handling back into pagedaemon code.
 1.13.2.2 25-Feb-1999  chs use a pool for vnode aio buffers, like the uvm_swap code does.
fix a precendence error in uvm_mk_pcluster() which prevented it
from doing anything useful (already in -current).
have uvm_mk_pcluster() check physical adjacency as well as logical.
bump uvmexp.paging in uvm_pager_put() instead of wherever it used to be.
(actually this last one is bogus, but I'll fix it shortly.)
in uvm_pager_dropcluster(), skip NULL entries in the array of pages
(needed since this function is now used by several more callers who
do not always have a full array).
add iodone handlers for vnode aio.
 1.13.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.16.4.7 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.16.4.6 02-Aug-1999  thorpej Fix a merge error.
 1.16.4.5 31-Jul-1999  chs in uvm_pagermapout(), be sure to pmap_remove() the range we're unmapping.
rewrite uvm_aio_aiodone() to be mostly the same as uvm_pager_dropcluster()
but using the pages' PG_FAKE and the buf's B_READ to decide whether the
each page should have the mod/ref bits cleared.
 1.16.4.4 11-Jul-1999  chs add uvm_errno2vmerror().
in uvm_aio_biodone1() use b_resid rather than b_bcount to track
how much of the nested i/o is left to be done.
 1.16.4.3 04-Jul-1999  chs remove uvm_aiobuf_pool. remove aiop arg to uvm_pagermapin().
re-enable pageout clustering.
revamp the uvm_aio_*iodone() functions to use bufs instead of aiobufs.
 1.16.4.2 21-Jun-1999  thorpej Sync w/ -current.
 1.16.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.23.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.23.4.1 15-Nov-1999  fvdl Sync with -current
 1.23.2.6 27-Mar-2001  bouyer Sync with HEAD.
 1.23.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.23.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.23.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.23.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.23.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.41.2.12 11-Nov-2002  nathanw Catch up to -current
 1.41.2.11 18-Oct-2002  nathanw Catch up to -current.
 1.41.2.10 16-Jul-2002  nathanw pagedaemon_proc really should be a proc, not a LWP.
 1.41.2.9 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.41.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.41.2.7 08-Jan-2002  nathanw Catch up to -current.
 1.41.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.41.2.5 22-Oct-2001  nathanw Catch up to -current.
 1.41.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.41.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.41.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.41.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.48.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.48.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.48.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.48.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.48.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.51.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.55.4.1 11-Mar-2002  thorpej Convert swap_syscall_lock and uvm.swap_data_lock to adaptive mutexes,
and rename them apporpriately.
 1.57.4.1 26-Aug-2003  tron Pull up revision 1.60 (requested by tls in ticket #1434):
Correct use of MAXBSIZE where MAXPHYS was intended. This is a necessary
first step towards per-device MAXPHYS, and has the beneficial side effect
of allowing clustering to MAXPHYS even on systems that need to run with
a reduced MAXBSIZE to get more metadata buffers.
 1.60.2.8 11-Dec-2005  christos Sync with head.
 1.60.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.60.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.60.2.5 17-Jan-2005  skrll Sync with HEAD.
 1.60.2.4 19-Oct-2004  skrll Sync with HEAD
 1.60.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.60.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.60.2.1 03-Aug-2004  skrll Sync with HEAD
 1.62.2.1 10-May-2004  tron Pull up revision 1.63 (requested by yamt in ticket #271):
fix a amap_wirerange deadlock problem by re-introducing
PG_RELEASED for anon pages. PR/23171 from Christian Limpach.
for details, see discussion filed in the PR database.
uvm_anon_release: a new function to free anon-owned PG_RELEASED page.
uvm_anfree: we can't wait for the page here because the caller might hold
amap lock. instead, just mark the page as PG_RELEASED.
who unbusy the page should check the PG_RELEASED.
uvm_aio_aiodone: uvm_anon_release() instead of uvm_page_unbusy()
if appropriate.
uvmfault_anonget: check PG_RELEASED.
 1.65.4.1 25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.65.2.1 29-Apr-2005  kent sync with -current
 1.68.2.8 17-Mar-2008  yamt sync with head.
 1.68.2.7 21-Jan-2008  yamt sync with head
 1.68.2.6 07-Dec-2007  yamt sync with head
 1.68.2.5 27-Oct-2007  yamt sync with head.
 1.68.2.4 03-Sep-2007  yamt sync with head.
 1.68.2.3 26-Feb-2007  yamt sync with head.
 1.68.2.2 30-Dec-2006  yamt sync with head.
 1.68.2.1 21-Jun-2006  yamt sync with head.
 1.71.6.1 29-Nov-2005  yamt sync with head.
 1.72.2.2 18-Feb-2006  yamt sync with head.
 1.72.2.1 15-Jan-2006  yamt sync with head.
 1.73.4.1 22-Apr-2006  simonb Sync with head.
 1.73.2.1 09-Sep-2006  rpaulo sync with head
 1.74.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.74.4.1 19-Apr-2006  elad oops - *really* sync to head this time.
 1.74.2.4 24-May-2006  yamt sync with head.
 1.74.2.3 11-Apr-2006  yamt sync with head
 1.74.2.2 12-Mar-2006  yamt - change the way to account read-ahead stats.
- fix UVM_PQFLAGBITS.
 1.74.2.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.77.8.2 12-Jan-2007  ad Sync with head.
 1.77.8.1 18-Nov-2006  ad Sync with head.
 1.78.2.1 22-Oct-2006  yamt use workqueue for aiodoned.
 1.79.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.81.6.1 11-Jul-2007  mjf Sync with head.
 1.81.4.12 09-Oct-2007  ad Sync with head.
 1.81.4.11 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.81.4.10 23-Aug-2007  ad softdep_pageiodone: softdep_pageiodone() acquires bqueue_lock.
 1.81.4.9 21-Aug-2007  yamt fix some races around pagedaemon and uvm_wait. ok'ed by Andrew Doran.
 1.81.4.8 20-Aug-2007  ad softdep locking improvements. It hangs looping in flush_inodedep_deps(),
more work required.
 1.81.4.7 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.81.4.6 15-Jul-2007  ad Sync with head.
 1.81.4.5 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.81.4.4 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.81.4.3 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.81.4.2 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.81.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.83.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.83.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.85.8.2 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.85.8.1 29-Jul-2007  ad file uvm_pager.c was added on branch matt-mips64 on 2007-07-29 13:31:18 +0000
 1.85.6.3 23-Mar-2008  matt sync with HEAD
 1.85.6.2 09-Jan-2008  matt sync with HEAD
 1.85.6.1 06-Nov-2007  matt sync with HEAD
 1.85.4.3 03-Dec-2007  joerg Sync with HEAD.
 1.85.4.2 28-Oct-2007  joerg Sync with HEAD.
 1.85.4.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.86.4.1 13-Nov-2007  bouyer Sync with HEAD
 1.87.2.2 18-Feb-2008  mjf Sync with HEAD.
 1.87.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.89.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.89.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.90.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.90.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.90.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.90.2.1 24-Mar-2008  keiichi sync with head.
 1.92.18.5 14-Feb-2012  matt Add more KASSERTs (more! more! more!).
When returning page to the free pool, make sure to dequeue the pages before
hand or free page queue corruption will happen.
 1.92.18.4 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.92.18.3 03-Jun-2011  matt Restore $NetBSD$
 1.92.18.2 03-Jun-2011  matt Rework page free lists to be sorted by color first rather than free_list.
Kept per color PGFL_* counter in each page free list.
Minor cleanups.
 1.92.18.1 25-May-2011  matt Make uvm_map recognize UVM_FLAG_COLORMATCH which tells uvm_map that the
'align' argument specifies the starting color of the KVA range to be returned.

When calling uvm_km_alloc with UVM_KMF_VAONLY, also specify the starting
color of the kva range returned (UMV_KMF_COLORMATCH) and pass those to
uvm_map.

In uvm_pglistalloc, make sure the pages being returned have sequentially
advancing colors (so they can be mapped in a contiguous address range).
Add a few missing UVM_FLAG_COLORMATCH flags to uvm_pagealloc calls.

Make the socket and pipe loan color-safe.

Make the mips pmap enforce strict page color (color(VA) == color(PA)).
 1.92.14.1 21-Nov-2010  riz Pull up following revision(s) (requested by rmind in ticket #1421):
sys/uvm/uvm_bio.c: revision 1.70
sys/uvm/uvm_map.c: revision 1.292
sys/uvm/uvm_pager.c: revision 1.98
sys/uvm/uvm_fault.c: revision 1.175
sys/uvm/uvm_bio.c: revision 1.69
ubc_fault: split-off code part handling a single page into ubc_fault_page().
Keep the lock around pmap_update() where required. While fixing this
in ubc_fault(), rework logic to &quot;remember&quot; the last object of page and
reduce locking overhead, since in common case pages belong to one and
the same UVM object (but not always, therefore add a comment).
Unlocks before pmap_update(), on removal of mappings, might cause TLB
coherency issues, since on architectures like x86 and mips64 invalidation
IPIs are deferred to pmap_update(). Hence, VA space might be globally
visible before IPIs are sent or while they are still in-flight.
OK ad@.
 1.92.12.3 28-Apr-2009  skrll Sync with HEAD.
 1.92.12.2 03-Mar-2009  skrll Sync with HEAD.
 1.92.12.1 19-Jan-2009  skrll Sync with HEAD.
 1.92.10.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.92.4.4 11-Aug-2010  yamt sync with head.
 1.92.4.3 11-Mar-2010  yamt sync with head
 1.92.4.2 19-Aug-2009  yamt sync with head.
 1.92.4.1 04-May-2009  yamt sync with head.
 1.93.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.97.4.5 31-May-2011  rmind sync with head
 1.97.4.4 05-Mar-2011  rmind sync with head
 1.97.4.3 03-Jul-2010  rmind sync with head
 1.97.4.2 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.97.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.97.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.98.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.98.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.100.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.107.6.1 18-Feb-2012  mrg merge to -current.
 1.107.2.5 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.107.2.4 01-Aug-2012  yamt - fix integrity sync.
putpages for integrity sync (fsync, msync with MS_SYNC, etc) should not
skip pages being written back by other threads.

- adapt to radix tree tag api changes.
 1.107.2.3 17-Apr-2012  yamt sync with head
 1.107.2.2 24-Jan-2012  yamt remove a redundant assertion
 1.107.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.108.10.1 18-May-2014  rmind sync with head
 1.108.6.3 03-Dec-2017  jdolecek update from HEAD
 1.108.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.108.6.1 10-Oct-2012  bouyer The pagedaemon ca emit write as large as the underlying device's maxphys,
so emergva size needs to be MACHINE_MAXPHYS, MAXPHYS is not enough.
 1.110.22.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.111.8.2 06-Jul-2021  martin Pull up following revision(s) - all via patch -
(requested by riastradh in ticket #1317):

sys/uvm/uvm_page.c: revision 1.248
sys/uvm/uvm_anon.c: revision 1.80
sys/rump/librump/rumpvfs/vm_vfs.c: revision 1.40
sys/rump/librump/rumpvfs/vm_vfs.c: revision 1.41
sys/rump/librump/rumpkern/vm.c: revision 1.191
sys/uvm/uvm_pager.c: revision 1.130
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vnops.c: revision 1.71
tests/rump/rumpkern/t_vm.c: revision 1.5
tests/rump/rumpkern/t_vm.c: revision 1.6
sys/rump/librump/rumpvfs/vm_vfs.c: revision 1.39

Move the handling of PG_PAGEOUT from uvm_aio_aiodone_pages() to
uvm_page_unbusy() so that all callers of uvm_page_unbusy() don't need to
handle this flag separately. Split out the pages part of uvm_aio_aiodone()
into uvm_aio_aiodone_pages() in rump just like in the real kernel.

In ZFS functions that can fail to copy data between the ARC and VM pages,
use uvm_aio_aiodone_pages() rather than uvm_page_unbusy() so that we can
handle these "I/O" errors. Fixes PR 55702.

fix an incorrect assertion in the previous commit.

Handle PG_PAGEOUT in uvm_anon_release() too.

Commit the ZFS file that I forgot in this previous commit:

Move the handling of PG_PAGEOUT from uvm_aio_aiodone_pages() to
uvm_page_unbusy() so that all callers of uvm_page_unbusy() don't need to
handle this flag separately. Split out the pages part of uvm_aio_aiodone()
into uvm_aio_aiodone_pages() in rump just like in the real kernel.

In ZFS functions that can fail to copy data between the ARC and VM pages,
use uvm_aio_aiodone_pages() rather than uvm_page_unbusy() so that we can
handle these "I/O" errors. Fixes PR 55702.
update the rump copy of uvm_page_unbusy() to match the real version,
in particular handle PG_PAGEOUT. fixes a few atf tests.
the busypage test is buggy, expect it to fail.

make rump's uvm_aio_aiodone_pages() look more like the kernel version.
fixes some more rumpy assertions.

for the busypage test, replace atf_tc_expect_fail() with atf_tc_skip()
because atf apparently has no way to expect a test program to crash.
fixes PR 55945.
 1.111.8.1 27-Dec-2019  martin Pull up following revision(s) (requested by ad in ticket #584):

sys/uvm/uvm_pager.c: revision 1.118

PR kern/48044: panic: kernel diagnostic assertion "uvmexp.swpgonly + npages <= uvmexp.swpginuse" failed
swpgonly is updated asynchronously with regard to swap use. We can't assert
this condition with confidence in the post-5.0 world, at least not without
broader changes. swpgonly's ultimate use is of a heuristic nature so this
is no problem at all.
 1.111.4.3 21-Apr-2020  martin Sync with HEAD
 1.111.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.111.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.119.2.2 29-Feb-2020  ad Sync with head.
 1.119.2.1 17-Jan-2020  ad Sync with head.
 1.124.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.49 19-May-2020  ad PR kern/32166: pgo_get protocol is ambiguous
Also problems with tmpfs+nfs noted by hannken@.

Don't pass PGO_ALLPAGES to pgo_get, and ignore PGO_DONTCARE in the
!PGO_LOCKED case. In uao_get() have uvm_pagealloc() take care of page
zeroing and release busy pages on error.
 1.48 17-May-2020  ad Start trying to reduce cache misses on vm_page during fault processing.

- Make PGO_LOCKED getpages imply PGO_NOBUSY and remove the latter. Mark
pages busy only when there's actually I/O to do.

- When doing COW on a uvm_object, don't mess with neighbouring pages. In
all likelyhood they're already entered.

- Don't mess with neighbouring VAs that have existing mappings as replacing
those mappings with same can be quite costly.

- Don't enqueue pages for neighbour faults unless not enqueued already, and
don't activate centre pages unless uvmpdpol says its useful.

Also:

- Make PGO_LOCKED getpages on UAOs work more like vnodes: do gang lookup in
the radix tree, and don't allocate new pages.

- Fix many assertion failures around faults/loans with tmpfs.
 1.47 22-Mar-2020  ad Process concurrent page faults on individual uvm_objects / vm_amaps in
parallel, where the relevant pages are already in-core. Proposed on
tech-kern.

Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until
adjustments are made to their pmaps.
 1.46 14-Mar-2020  ad Make uvm_pagemarkdirty() responsible for putting vnodes onto the syncer
work list. Proposed on tech-kern@.
 1.45 09-Dec-2018  jdolecek update comment - PGO_JOURNALLOCKED now supported for 'get' too
 1.44 13-Jan-2017  christos branches: 1.44.14; 1.44.16;
add missing forward struct decl
 1.43 29-Apr-2012  chs branches: 1.43.2; 1.43.16; 1.43.20;
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
 1.42 28-Sep-2011  matt branches: 1.42.2; 1.42.6; 1.42.8;
Reallocate emergency pager va when ncolors is increased. (modication of
patch from mrg).
 1.41 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
verified with Mike Hibler it is ok to remove clause 3 on utah copyright,
as per UCB.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.40 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.39 01-Sep-2010  chs branches: 1.39.2; 1.39.4;
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.38 22-Aug-2008  hannken branches: 1.38.4; 1.38.10; 1.38.12; 1.38.14; 1.38.16;
Add snapshot support for logging ffs file systems.

- Add UFS_WAPBL_BEGIN() / UFS_WAPBL_END() where needed.

- Expunge WAPBL log inodes from snapshots.

- Ffs_copyonwrite() and ffs_snapblkfree() must run inside a WAPBL transaction.

- Add ffs_gop_write() as a wrapper around genfs_gop_write() that makes sure
genfs_gop_write() gets always called inside a WAPBL transaction.

- Add VOP_PUTPAGES() flag PGO_JOURNALLOCKED to tag calls to VOP_PUTPAGES()
inside a WAPBL transaction.

Reviewed by: Simon Burge <simonb@netbsd.org>, Greg Oster <oster@netbsd.org>

PGO_JOURNALLOCKED / ffs_gop_write() part presented on tech-kern@.
 1.37 25-Oct-2007  yamt branches: 1.37.16; 1.37.20; 1.37.22; 1.37.26;
defparam PAGER_MAP_SIZE.
 1.36 23-Apr-2007  pooka branches: 1.36.6; 1.36.8; 1.36.12;
adjust misleading comment: PGO_SYNCIO does not depend on PGO_CLEANIT
 1.35 16-Apr-2007  chs define a pager flag PGO_RECLAIM, similar to FSYNC_RECLAIM, and use it
to skip unnecessary flushing when layered file system vnodes are recycled.
this also prevents a deadlock with the dodgy LFS putpages routine.
fixes the non-LFS part of PR 36150.
 1.34 22-Feb-2006  drochner branches: 1.34.18; 1.34.20; 1.34.24; 1.34.26;
kill the "fault_type" argument to pager's pgo_fault() methods
it is never used
(and using it would comprise an abstraction violation imho)
 1.33 11-Feb-2006  yamt remove the following options. no objections on tech-kern@.

UVM_PAGER_INLINE
UVM_AMAP_INLINE
UVM_PAGE_INLINE
UVM_MAP_INLINE
 1.32 24-Dec-2005  perry branches: 1.32.2; 1.32.4; 1.32.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.31 11-Dec-2005  christos merge ktrace-lwp.
 1.30 23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.29 17-Jul-2005  yamt - introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.

- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.
 1.28 24-Mar-2004  junyoung branches: 1.28.14; 1.28.16;
Nuke __P().
 1.27 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.26 17-Feb-2003  perseant branches: 1.26.2;
Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.25 25-Mar-2002  chs remove PGO_WEAK, it isn't needed anymore.
 1.24 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.23 26-May-2001  chs branches: 1.23.2; 1.23.4;
replace vm_page_t with struct vm_page *.
 1.22 25-May-2001  chs remove trailing whitespace.
 1.21 10-Mar-2001  chs eliminate the VM_PAGER_* error codes in favor of the traditional E* codes.
the mapping is:

VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART

for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
 1.20 27-Nov-2000  chs branches: 1.20.2;
Initial integration of the Unified Buffer Cache project.
 1.19 27-Nov-2000  chs allow ports to override PAGER_MAP_SIZE in machine/vmparam.h.
some ports (such as arm32) don't have enough KVA for the
increased default size once the UBC mapping is also present.
 1.18 24-Nov-2000  chs increase PAGER_MAP_SIZE to 16MB and move it to uvm_pager.h
since the alpha and mips pmaps use it.
 1.17 24-Nov-2000  chs g/c unused pager ops "asyncget" and "aiodone".
 1.16 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.15 19-May-2000  thorpej Tell uvm_pagermapin() the direction of the I/O so that it can map
with only the protection that it needs.
 1.14 03-Apr-2000  chs remove uvm_shareprot(). no longer needed since the demise of share maps.
 1.13 03-Apr-2000  chs remove the "shareprot" pagerop. it's not needed anymore since
share maps are long gone.
 1.12 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.11 11-Jan-2000  chs add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.10 21-Jun-1999  thorpej branches: 1.10.2;
Protect prototypes, certain macros, and inlines from userland.
 1.9 25-Mar-1999  mrg branches: 1.9.4;
remove now >1 year old pre-release message.
 1.8 24-Mar-1999  cgd after discussion with chuck, nuke pgo_attach from uvm_pagerops
 1.7 13-Aug-1998  eeh branches: 1.7.2;
Merge paddr_t changes into the main branch.
 1.6 09-Mar-1998  mrg branches: 1.6.2;
KNF.
 1.5 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.6.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.7.2.3 30-May-1999  chs add a flags field to struct uvm_aiodesc and define a flag for it.
 1.7.2.2 25-Feb-1999  chs add aio stuff.
 1.7.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.9.4.5 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.9.4.4 11-Jul-1999  chs add uvm_errno2vmerror().
 1.9.4.3 04-Jul-1999  chs remove uvm_aiodesc and uvm_aiobuf.
update uvm_pagermapin() proto.
 1.9.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.9.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.10.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.10.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.10.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.20.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.20.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.20.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.20.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.23.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.23.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.23.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.26.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.26.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.26.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.26.2.1 03-Aug-2004  skrll Sync with HEAD
 1.28.16.3 27-Oct-2007  yamt sync with head.
 1.28.16.2 03-Sep-2007  yamt sync with head.
 1.28.16.1 21-Jun-2006  yamt sync with head.
 1.28.14.1 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.32.6.1 22-Apr-2006  simonb Sync with head.
 1.32.4.1 09-Sep-2006  rpaulo sync with head
 1.32.2.2 01-Mar-2006  yamt sync with head.
 1.32.2.1 18-Feb-2006  yamt sync with head.
 1.34.26.1 11-Jul-2007  mjf Sync with head.
 1.34.24.1 08-Jun-2007  ad Sync with head.
 1.34.20.1 07-May-2007  yamt sync with head.
 1.34.18.1 16-Apr-2007  bouyer Pull up following revision(s) (requested by chs in ticket #577):
sys/kern/vfs_subr.c: revision 1.287
sys/fs/union/union_vnops.c: revision 1.20
sys/miscfs/genfs/layer_vnops.c: revision 1.30
sys/uvm/uvm_pager.h: revision 1.35
define a pager flag PGO_RECLAIM, similar to FSYNC_RECLAIM, and use it
to skip unnecessary flushing when layered file system vnodes are recycled.
this also prevents a deadlock with the dodgy LFS putpages routine.
fixes the non-LFS part of PR 36150.
 1.36.12.1 13-Nov-2007  bouyer Sync with HEAD
 1.36.8.1 06-Nov-2007  matt sync with HEAD
 1.36.6.1 28-Oct-2007  joerg Sync with HEAD.
 1.37.26.1 19-Oct-2008  haad Sync with HEAD.
 1.37.22.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.37.20.2 09-Oct-2010  yamt sync with head
 1.37.20.1 04-May-2009  yamt sync with head.
 1.37.16.1 28-Sep-2008  mjf Sync with HEAD.
 1.38.16.1 05-Mar-2011  rmind sync with head
 1.38.14.3 21-Nov-2010  uebayasi Rename PGO_ZERO as PGO_HOLE, and s/uvm_page_zeropage/uvm_page_holepage/.
 1.38.14.2 21-Nov-2010  uebayasi Resurrect PGO_ZERO support.

When vnode pager encounters hole pages in XIP'ed vnodes, it fills
page slots with PGO_ZERO and returns them back to the caller (fault
handler). Fault handlers are responsible to check page slots and
redirect PGO_ZERO to the single "zero page" allocated by calling
uvm_page_zeropage_alloc(9).

The zero page is wired, read-only (PG_RDONLY) page. It's shared
by multiple vnodes, it has no single owner.

XIP'ed vnodes are supposed to be "stable" during I/O (unlocked).
Because XIP'ed mounts are always read-only. There's no chance to
change mappings of XIP'ed vnodes and their XIP'ed pages. Thus the
cached uobj is reused after pgo_get() for PGO_ZERO.

(Do we need a new concept of "read-only UVM object"?)
 1.38.14.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.38.12.1 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.38.10.1 07-Sep-2010  bouyer Pull up following revision(s) (requested by chs in ticket #1448):
sys/uvm/uvm_pager.h: revision 1.39 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.183 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.83 via patch
sys/miscfs/genfs/genfs_io.c: revision 1.40 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.20 via patch
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.38.4.1 07-Sep-2010  bouyer Pull up following revision(s) (requested by chs in ticket #1448):
sys/uvm/uvm_pager.h: revision 1.39 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.183 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.83 via patch
sys/miscfs/genfs/genfs_io.c: revision 1.40 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.20 via patch
replace the earlier workaround for PR 40389 with a better fix.
the earlier change caused data corruption by freeing pages
without invaliding their mappings. instead of the trylock/retry,
just take the genfs-node lock before calling VOP_GETPAGES()
and pass a new flag to tell it that we're already holding this lock.
 1.39.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.39.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.42.8.1 07-May-2012  riz Pull up following revision(s) (requested by chs in ticket #204):
sys/fs/sysvbfs/sysvbfs_vnops.c: revision 1.44
sys/ufs/ffs/ffs_vfsops.c: revision 1.277
sys/fs/v7fs/v7fs_vnops.c: revision 1.11
sys/ufs/chfs/chfs_vnops.c: revision 1.7
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.61
sys/miscfs/genfs/genfs_io.c: revision 1.54
sys/kern/vfs_wapbl.c: revision 1.52
sys/uvm/uvm_pager.h: revision 1.43
sys/ufs/ffs/ffs_vnops.c: revision 1.121
sys/kern/vfs_subr.c: revision 1.434
sys/fs/msdosfs/msdosfs_vnops.c: revision 1.83
sys/fs/ntfs/ntfs_vnops.c: revision 1.51
sys/fs/udf/udf_subr.c: revision 1.119
sys/miscfs/specfs/spec_vnops.c: revision 1.135
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.103
sys/fs/udf/udf_vnops.c: revision 1.71
sys/ufs/ufs/ufs_readwrite.c: revision 1.104
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
mark all wapbl I/O as BPRIO_TIMECRITICAL.
this is the second part of addressing PR 46325.
 1.42.6.1 02-Jun-2012  mrg sync to latest -current.
 1.42.2.1 23-May-2012  yamt sync with head.
 1.43.20.1 20-Mar-2017  pgoyette Sync with HEAD
 1.43.16.1 05-Feb-2017  skrll Sync with HEAD
 1.43.2.1 03-Dec-2017  jdolecek update from HEAD
 1.44.16.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.44.16.1 10-Jun-2019  christos Sync with HEAD
 1.44.14.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.16 11-Feb-2006  yamt remove the following options. no objections on tech-kern@.

UVM_PAGER_INLINE
UVM_AMAP_INLINE
UVM_PAGE_INLINE
UVM_MAP_INLINE
 1.15 11-Dec-2005  christos branches: 1.15.2; 1.15.4; 1.15.6;
merge ktrace-lwp.
 1.14 28-Jun-2005  thorpej branches: 1.14.2;
Clean up the cpp macro used to say "we're compiling this specific C file".
 1.13 27-Jun-2005  thorpej Use ANSI function decls.
 1.12 01-Dec-2002  matt branches: 1.12.6;
Reorder things so that with multiple inclusion protection that optional
definitions are outside the protection checks.
 1.11 25-May-2001  chs remove trailing whitespace.
 1.10 25-Nov-2000  chs branches: 1.10.2;
lots of cleanup:
use queue.h macros and KASSERT().
address amap offsets in pages instead of bytes.
make amap_ref() and amap_unref() take an amap, offset and length
instead of a vm_map_entry_t.
improve whitespace and comments.
 1.9 08-May-2000  thorpej __predict_false() an error check.
 1.8 08-Jul-1999  thorpej branches: 1.8.2;
Change the pmap_extract() interface to:
boolean_t pmap_extract(pmap_t, vaddr_t, paddr_t *);
This makes it possible for the pmap to map physical address 0.
 1.7 25-Mar-1999  mrg branches: 1.7.4;
remove now >1 year old pre-release message.
 1.6 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.5 09-Mar-1998  mrg branches: 1.5.2;
KNF.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.5.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.7.4.1 02-Aug-1999  thorpej Update from trunk.
 1.8.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.8.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.2.2 11-Dec-2002  thorpej Sync with HEAD.
 1.10.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.12.6.1 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.14.2.1 21-Jun-2006  yamt sync with head.
 1.15.6.1 22-Apr-2006  simonb Sync with head.
 1.15.4.1 09-Sep-2006  rpaulo sync with head
 1.15.2.1 18-Feb-2006  yamt sync with head.
 1.42 11-Jul-2023  riastradh sys: Rip <sys/resourcevar.h> out of <uvm/uvm_param.h>.

And thus out of <sys/param.h>, which is exceedingly overused and
fragile and delenda est.

Should fix (some) issues with the recent inclusion of machine/lock.h
in various machine/mutex.h files.
 1.41 23-Jul-2020  skrll branches: 1.41.20;
unifdef -U_LKM
 1.40 25-Jun-2020  jdolecek uvm_emap_size was removed a while ago
 1.39 25-Jun-2020  mlelstv If ubc_winshift gets constified, the extern declaration must be too.
 1.38 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.37 02-Jul-2017  joerg branches: 1.37.4; 1.37.6;
Export the guard size of the main thread via vm.guard_size. Add a
complementary writable sysctl for the initial guard size of threads
created via pthread_create. Let the existing attribut accessors do the
right thing. Raise the default guard size for threads to 64KB.
 1.36 23-Jun-2017  joerg Recommit exec_subr.c revision 1.79:
Always include a 1MB guard area beyond the end of stack. While ASLR will
normally create a guard area as well, this provides a deterministic area
for all binaries.

Mitigates the rest of CVE-2017-1000374 and CVE-2017-1000375 from
Qualys.

Additionally, change VM_DEFAULT_ADDRESS_TOPDOWN to include
user_stack_guard_size in the size reservation.
 1.35 26-Sep-2015  christos branches: 1.35.10;
move CTL_VM constants to uvm_param.h, leaving a comment behind.
 1.34 26-Feb-2014  matt branches: 1.34.6;
Add vm.min_address and vm.max_address which return VM_MIN_ADDRESS and
VM_MAXUSER_ADDRESS.
 1.33 25-Jan-2014  christos delete VM_DEFAULT_ADDRESS and commentary which is no longer used/true.
 1.32 25-Jan-2014  christos provide proper defaults for topdown and bottomup allocation.
XXX: Ports that provide their own VM_DEFAULT_ADDRESS() need to provide the
two new flavors, otherwise they get the default ones now.
 1.31 19-Mar-2012  uebayasi branches: 1.31.2; 1.31.4;
Expose vm_inherit/voff_t/pgoff_t to userland to fix build.
 1.30 18-Mar-2012  uebayasi Move base type definitions from uvm_extern.h to uvm_param.h so that
other sources can easily include part of UVM headers without the whole
uvm_extern.h (e.g. sys/vnode.h wants only uvm_object.h).
 1.29 27-Feb-2012  he __uvmexp_pagesize is needed also for non-modular builds, as
witnessed by the otherwise failing sparc build.
 1.28 23-Feb-2012  matt Add "opt_modular.h"
#define __uvmexp_pagesize
if MIN_PAGE_SIZE != MAX_PAGE_SIZE && modular is defined
 1.27 17-Feb-2012  matt Make sure to export uvmexp_* if MODULAR is defined.
Make the uvmexp_page* be a pointer to a const int as well as having the
pointer be const as well.
 1.26 29-Nov-2011  matt branches: 1.26.2;
Redefine ptoa() to be the inverse of atop. If you were using a 32-bit vaddr_t
with 64-bit paddr_t and using managed addresses > 4GB, uvm_page_init would
silently discard the upper 32-bits of the physical address possibly double
mapping pages.
 1.25 14-Nov-2010  uebayasi branches: 1.25.8;
Oops. Fix thinko.
 1.24 14-Nov-2010  uebayasi Platforms that dynamically set PAGE_{SIZE,MASK,SHIFT}, those values are
saved in struct uvmexp. Expose only the relevant part for symbol users,
so that they don't need to include the whole uvm(9) API.
 1.23 13-Nov-2010  uebayasi UVM constants should not rely on sys/lock.h.
 1.22 20-Jul-2009  kiyohara branches: 1.22.2; 1.22.4;
Globalize uvm_emap_size. It use to calculate size of kernel page table.
http://mail-index.netbsd.org/current-users/2009/07/13/msg009983.html
 1.21 04-Aug-2006  he branches: 1.21.58; 1.21.74; 1.21.78;
Rearrange included headers and/or add include of <sys/types.h> and
<sys/lock.h>, so that the mipsco port can build again, ref.
http://mail-index.netbsd.org/port-mips/2006/08/04/0000.html
Reviewed by thorpej
 1.20 11-Dec-2005  christos branches: 1.20.4; 1.20.8;
merge ktrace-lwp.
 1.19 04-Apr-2004  pk branches: 1.19.12;
Use maxdmap and maxsmap instead of MAXDSIZ and MAXSSIZ.
 1.18 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.17 19-Apr-2003  christos branches: 1.17.2;
PR/2931: Eric Beltensen: Move boolean_t and TRUE/FALSE from uvm_param.h to
types.h
 1.16 09-Apr-2003  thorpej Tweak the way the pagesize-related variables are set:
* Remove DEFAULT_PAGE_SIZE. We don't use PAGE_SIZE the way Mach did.
* In uvm_setpagesize(), if we are called with uvmexp.pagesize == 0,
then assert that PAGE_SIZE != 0 (i.e. a constant), and set uvmexp.pagesize
accordingly.
* Provide defaults for MIN_PAGE_SIZE and MAX_PAGE_SIZE if not defined
by <machine/vmparam.h>. If PAGE_SIZE is not a constant, MIN_PAGE_SIZE
and MAX_PAGE_SIZE must be provided.
* If MIN_PAGE_SIZE and MAX_PAGE_SIZE are not equal (i.e. PAGE_SIZE may
not be a constant in all configurations), then ensure that PAGE_SIZE
and friends expand to variable references for LKMs.
 1.15 14-Mar-2003  matt Nuke mem_size global since nothing in the kernel actually refers to it.
(mmm lint).
 1.14 20-Feb-2003  atatat Introduce "top down" memory management for mmap()ed allocations. This
means that the dynamic linker gets mapped in at the top of available
user virtual memory (typically just below the stack), shared libraries
get mapped downwards from that point, and calls to mmap() that don't
specify a preferred address will get mapped in below those.

This means that the heap and the mmap()ed allocations will grow
towards each other, allowing one or the other to grow larger than
before. Previously, the heap was limited to MAXDSIZ by the placement
of the dynamic linker (and the process's rlimits) and the space
available to mmap was hobbled by this reservation.

This is currently only enabled via an *option* for the i386 platform
(though other platforms are expected to follow). Add "options
USE_TOPDOWN_VM" to your kernel config file, rerun config, and rebuild
your kernel to take advantage of this.

Note that the pmap_prefer() interface has not yet been modified to
play nicely with this, so those platforms require a bit more work
(most notably the sparc) before they can use this new memory
arrangement.

This change also introduces a VM_DEFAULT_ADDRESS() macro that picks
the appropriate default address based on the size of the allocation or
the size of the process's text segment accordingly. Several drivers
and the SYSV SHM address assignment were changed to use this instead
of each one picking their own "default".
 1.13 09-Dec-2001  chs add {anon,file,exec}max as a upper bound on the amount of memory that
will be allocated for the respective usage types when there is contention
for memory.

replace "vnode" and "vtext" with "file" and "exec" in uvmexp field names
and sysctl names.
 1.12 05-Aug-2001  matt Don't include <machine/pmap.h> and <machine/vmparam.h> if _KERNEL isn't
defined. Include them explicitly in the few kvm_arch.c that need them.
 1.11 14-Jul-2001  matt Add support for kern.maxphys, vm.maxslp, vm.uspace (the later two for ps).
 1.10 25-May-2001  chs branches: 1.10.2;
remove trailing whitespace.
 1.9 02-May-2001  thorpej Support dynamic sizing of the page color bins. We also support
dynamically re-coloring pages; as machine-dependent code discovers
the size of the system's caches, it may call uvm_page_recolor() with
the new number of colors to use. If the new mumber of colors is
smaller (or equal to) the current number of colors, then uvm_page_recolor()
is a no-op.

The system defaults to one bucket if machine-dependent code does not
initialize uvmexp.ncolors before uvm_page_init() is called.

Note that the number of color bins should be initialized to something
reasonable as early as possible -- for many early memory allocations,
we live with the consequences of the page choice for the lifetime of
the boot.
 1.8 29-Apr-2001  thorpej Implement page coloring, using a round-robin bucket selection
algorithm (Solaris calls this "Bin Hopping").

This implementation currently relies on MD code to define a
constant defining the number of buckets. This will change
reasonably soon (MD code will be able to dynamically size
the bucket array).
 1.7 21-Mar-2001  chs use ubc_winshift instead of ubc_winsize in pmaps to set up kernel
virtual space. the latter isn't initialized yet when the value is needed.
fixes PR 12440.
 1.6 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.5 09-Mar-2001  chs add UBC memory-usage balancing. we track the number of pages in use for
each of the basic types (anonymous data, executable image, cached files)
and prevent the pagedaemon from reusing a given page if that would reduce
the count of that type of page below a sysctl-setable minimum threshold.
the thresholds are controlled via three new sysctl tunables:
vm.anonmin, vm.vnodemin, and vm.vtextmin. these tunables are the
percentages of pageable memory reserved for each usage, and we do not allow
the sum of the minimums to be more than 95% so that there's always some
memory that can be reused.
 1.4 09-Jan-2001  pk branches: 1.4.2;
atop(): cast argument to `paddr_t' (instead of `u_long') to avoid
truncating the address.
 1.3 21-Dec-2000  chs expose the tunables ubc_nwins and ubc_winsize in uvm_param.h.
add the space used by UBC mappings to the initial PTE calculations
for pmaps that do that (mips and alpha).
 1.2 29-Nov-2000  simonb Add a vm.uvmexp2 sysctl that uses a ABI-safe 'struct uvmexp_sysctl'.
 1.1 26-Jun-2000  mrg branches: 1.1.2;
<vm/vm_param.h> -> <uvm/uvm_param.h>
 1.1.2.7 27-Mar-2001  bouyer Sync with HEAD.
 1.1.2.6 12-Mar-2001  bouyer Sync with HEAD.
 1.1.2.5 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.1.2.4 05-Jan-2001  bouyer Sync with HEAD
 1.1.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.1.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.1.2.1 26-Jun-2000  bouyer file uvm_param.h was added on branch thorpej_scsipi on 2000-11-20 18:12:05 +0000
 1.4.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.4.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.4.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.4.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.10.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.2.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.10.2.1 03-Aug-2001  lukem update to -current
 1.17.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.17.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.17.2.1 03-Aug-2004  skrll Sync with HEAD
 1.19.12.1 30-Dec-2006  yamt sync with head.
 1.20.8.1 11-Aug-2006  yamt sync with head
 1.20.4.1 09-Sep-2006  rpaulo sync with head
 1.21.78.1 29-Nov-2011  matt Redefine ptoa() to be the inverse of atop. If you were using a 32-bit vaddr_t
with 64-bit paddr_t and using managed addresses > 4GB, uvm_page_init would
silently discard the upper 32-bits of the physical address possibly double
mapping pages.
 1.21.74.1 23-Jul-2009  jym Sync with HEAD.
 1.21.58.1 19-Aug-2009  yamt sync with head.
 1.22.4.1 05-Mar-2011  rmind sync with head
 1.22.2.1 16-Nov-2010  uebayasi Sync with HEAD.
 1.25.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.25.8.1 17-Apr-2012  yamt sync with head
 1.26.2.4 05-Apr-2012  mrg sync to latest -current.
 1.26.2.3 04-Mar-2012  mrg sync to latest -current.
 1.26.2.2 24-Feb-2012  mrg sync to -current.
 1.26.2.1 18-Feb-2012  mrg merge to -current.
 1.31.4.1 18-May-2014  rmind sync with head
 1.31.2.2 03-Dec-2017  jdolecek update from HEAD
 1.31.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.34.6.2 28-Aug-2017  skrll Sync with HEAD
 1.34.6.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.35.10.1 31-Aug-2017  bouyer Pull up following revision(s) (requested by joerg in ticket #234):
sys/arch/amd64/include/vmparam.h: revision 1.43
sys/kern/exec_subr.c: revision 1.79
lib/libpthread/pthread_int.h: revision 1.94
sys/arch/mips/include/vmparam.h: revision 1.58
sys/arch/mips/include/vmparam.h: revision 1.59
lib/libpthread/TODO: revision 1.19
sys/arch/powerpc/include/vmparam.h: revision 1.20
sys/arch/riscv/include/vmparam.h: revision 1.2
sys/arch/riscv/include/vmparam.h: revision 1.3
sys/arch/i386/include/vmparam.h: revision 1.85
tests/lib/libpthread/t_join.c: revision 1.9
sys/uvm/uvm_meter.c: revision 1.66
sys/uvm/uvm_param.h: revision 1.36
sys/kern/exec_subr.c: revision 1.80
sys/uvm/uvm_param.h: revision 1.37
sys/kern/exec_subr.c: revision 1.81
sys/kern/exec_subr.c: revision 1.82
lib/libpthread/pthread_attr_getguardsize.3: revision 1.4
lib/libpthread/pthread.c: revision 1.148
lib/libpthread/pthread_attr.c: revision 1.17
sys/arch/amd64/include/vmparam.h: revision 1.42
Always include a 1MB guard area beyond the end of stack. While ASLR will
normally create a guard area as well, this provides a deterministic area
for all binaries.
Mitigates the rest of CVE-2017-1000374 and CVE-2017-1000375 from
Qualys.
Revert for the moment, creates problems on i386.
Recommit exec_subr.c revision 1.79:
Always include a 1MB guard area beyond the end of stack. While ASLR will
normally create a guard area as well, this provides a deterministic area
for all binaries.
Mitigates the rest of CVE-2017-1000374 and CVE-2017-1000375 from
Qualys.
Additionally, change VM_DEFAULT_ADDRESS_TOPDOWN to include
user_stack_guard_size in the size reservation.
Update VM_DEFAULT_ADDRESS32_TOPDOWN to include guard area.
Export the guard size of the main thread via vm.guard_size. Add a
complementary writable sysctl for the initial guard size of threads
created via pthread_create. Let the existing attribut accessors do the
right thing. Raise the default guard size for threads to 64KB.
 1.37.6.1 10-Jun-2019  christos Sync with HEAD
 1.37.4.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.41.20.1 09-Aug-2023  martin Pull up following revision(s) (requested by maya in ticket #316):

sys/arch/m68k/include/mutex.h: revision 1.13
sys/arch/arm/include/cpu.h: revision 1.125
sys/arch/sun68k/include/intr.h: revision 1.21
sys/arch/arm/include/mutex.h: revision 1.28
sys/sys/rwlock.h: revision 1.18
sys/arch/powerpc/include/mutex.h: revision 1.7
sys/arch/arm/include/mutex.h: revision 1.29
sys/arch/powerpc/include/mutex.h: revision 1.8
sys/uvm/uvm_param.h: revision 1.42
sys/sys/ksem.h: revision 1.16
sys/arch/x86/include/mutex.h: revision 1.10
sys/sys/proc.h: revision 1.372
sys/sys/ksem.h: revision 1.17
sys/arch/ia64/include/mutex.h: revision 1.8
sys/arch/evbarm/include/intr.h: revision 1.29
sys/sys/lua.h: revision 1.9
sys/arch/next68k/include/intr.h: revision 1.23
sys/arch/ia64/include/mutex.h: revision 1.9
sys/arch/hp300/include/intr.h: revision 1.35
sys/arch/hp300/include/intr.h: revision 1.36
sys/arch/sparc/include/cpu.h: revision 1.111
sys/arch/hppa/include/mutex.h: revision 1.16
sys/arch/vax/include/intr.h: revision 1.31
sys/arch/hppa/include/mutex.h: revision 1.17
sys/arch/news68k/include/intr.h: revision 1.28
sys/arch/hppa/include/mutex.h: revision 1.18
sys/arch/hppa/include/intr.h: revision 1.3
sys/arch/hppa/include/mutex.h: revision 1.19
sys/arch/hppa/include/intr.h: revision 1.4
sys/sys/sched.h: revision 1.92
sys/opencrypto/cryptodev.h: revision 1.51
sys/arch/vax/include/mutex.h: revision 1.20
sys/arch/sparc64/include/mutex.h: revision 1.10
sys/arch/ia64/include/sapicvar.h: revision 1.2
sys/arch/riscv/include/mutex.h: revision 1.5
sys/arch/amiga/dev/grfabs_cc.c: revision 1.39
sys/external/bsd/drm2/include/linux/idr.h: revision 1.11
sys/arch/riscv/include/mutex.h: revision 1.6
sys/ddb/files.ddb: revision 1.16
sys/arch/mac68k/include/intr.h: revision 1.32
share/man/man4/ddb.4: revision 1.203
sys/ddb/db_command.c: revision 1.183
sys/arch/mips/include/mutex.h: revision 1.10
sys/ddb/db_command.c: revision 1.184
sys/arch/x68k/include/intr.h: revision 1.22
sys/arch/sparc/include/psl.h: revision 1.51
sys/arch/or1k/include/mutex.h: revision 1.4
sys/arch/mips/include/mutex.h: revision 1.11
sys/arch/arm/xscale/pxa2x0_intr.h: revision 1.16
sys/arch/sparc64/include/cpu.h: revision 1.134
sys/arch/sparc/include/psl.h: revision 1.52
sys/arch/or1k/include/mutex.h: revision 1.5
sys/arch/mvme68k/include/intr.h: revision 1.22
sys/arch/luna68k/include/intr.h: revision 1.16
external/cddl/osnet/sys/sys/kcondvar.h: revision 1.6
sys/arch/sparc/include/mutex.h: revision 1.12
sys/arch/sparc/include/mutex.h: revision 1.13
sys/arch/usermode/include/mutex.h: revision 1.5
sys/arch/usermode/include/mutex.h: revision 1.6
sys/kern/kern_core.c: revision 1.38
usr.sbin/crash/Makefile: revision 1.49
sys/arch/amiga/include/intr.h: revision 1.23
sys/arch/alpha/include/mutex.h: revision 1.12
sys/arch/alpha/include/mutex.h: revision 1.13
sys/arch/evbarm/lubbock/sacc_obio.c: revision 1.16
sys/ddb/ddb.h: revision 1.6
sys/arch/sparc64/include/mutex.h: revision 1.8
sys/arch/sh3/include/mutex.h: revision 1.12
sys/arch/evbarm/lubbock/sacc_obio.c: revision 1.17
sys/ddb/db_syncobj.c: revision 1.1
sys/arch/vax/include/mutex.h: revision 1.18
sys/arch/sparc64/include/psl.h: revision 1.63
sys/arch/sparc64/include/mutex.h: revision 1.9
sys/arch/sh3/include/mutex.h: revision 1.13
sys/arch/evbarm/lubbock/obio.c: revision 1.13
sys/arch/atari/include/intr.h: revision 1.23
sys/ddb/db_syncobj.c: revision 1.2
sys/arch/vax/include/mutex.h: revision 1.19
sys/arch/evbarm/g42xxeb/obio.c: revision 1.14
sys/arch/evbarm/g42xxeb/obio.c: revision 1.15
sys/arch/cesfic/include/intr.h: revision 1.14
sys/ddb/db_syncobj.h: revision 1.1
sys/arch/x86/include/cpu.h: revision 1.134
sys/arch/evbarm/g42xxeb/obio.c: revision 1.16
sys/arch/cesfic/include/intr.h: revision 1.15
sys/arch/arm/xscale/pxa2x0_intr.c: revision 1.26
sys/sys/cpu_data.h: revision 1.54
sys/arch/m68k/include/mutex.h: revision 1.12
sys/arch/ia64/acpi/madt.c: revision 1.6

sys/rwlock.h: Make this more self-contained for bool.

machine/mutex.h: Sprinkle includes so this can be used by crash(8).

ddb: New `show all tstiles' command.
Shows who's waiting for which locks and what the owner is up to.

Include psl.h for ipl_cookie_t if __MUTEX_PRIVATE

sys: Rip <sys/resourcevar.h> out of <uvm/uvm_param.h>.

And thus out of <sys/param.h>, which is exceedingly overused and
fragile and delenda est.

Should fix (some) issues with the recent inclusion of machine/lock.h
in various machine/mutex.h files.

arm/mutex.h: Need machine/intr.h, machine/lock.h.

For ipl_cookie_t and __cpu_simple_lock_t.
evbarm/intr.h: Define ipl_cookie_t before including ARM_INTR_IMPL.

Otherwise arm/mutex.h doesn't work, due to a cyclic dependency which
should really be fixed.
opencrypto/cryptodev.h: Fix includes.
- Move sys/condvar.h under #ifdef _KERNEL.
- Add some other necessary includes and forward declarations.
- Sort.

hp300/intr.h: Fix missing includes.
linux/idr.h: Need <sys/mutex.h> for kmutex_t.
amiga/intr.h: Don't define spl*() functions if !_KERNEL.

This is used by crash(8) now, and what's important is ipl_cookie_t.
cesfic/intr.h: Expose ipl_cookie_t to userland for crash(8).
cesfic/intr.h: Expose ipl_cookie_t to userland only with _KMEMUSER.

Probably not necessary but let's be a little more cautious about
this.

atari/intr.h: Expose ipl_cookie_t with _KMEMUSER for crash(8).

arm/cpu.h: Need sys/param.h for COHERENCY_UNIT.

Nix machine/param.h -- not meant to be used directly, pulled in by
sys/param.h.

Move the definition of ipl_cookie_t out of the kernel-only sections,
some _KMEMUSER applications need it.

ddb: Cast pointer to uintptr_t first before db_expr_t.

hppa/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

luna68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

mvme68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

news68k/intr.h: Fix includes. Put some definitions under _KERNEL.

next68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

sys/ksem.h: Hack around fstat(8) abuse of _KERNEL.

sun68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

vax/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).

x68k/intr.h: Put functions under _KERNEL so crash(8) can use this.

Make ipl_cookie_t visible for _KMEMUSER userland applications.

fix editor mishap in previous

Explicitly include <sys/mutex.h> for kmutex_t.

Replace kmutex_t * (which may be undefined here) with struct kmutex *,
suggested by Taylor.

hp300/intr.h: Put most of this under #ifdef _KERNEL.
Only ipl_cookie_t really needs to be exposed now, for crash(8).

mac68k/intr.h: Expose ipl_cookie_t to _KMEMUSER for crash(8).
Make inclusion of sys/intr.h explicit for spl*.

fix hppa and vax builds.

machine/lock.h isn't necessary for __cpu_simple_lock_t, it's in
sys/types.h. avoids cpu_data.h vs sched.h include order issues.

move the hppa ipl_t typedef with the moved usage of it.
machine/mutex.h: Sprinkle sys/types.h, omit machine/lock.h.

Turns out machine/lock.h is not needed for __cpu_simple_lock_t, which
always comes from sys/types.h. And, really, sys/types.h (or at least
sys/stdint.h) is needed for uintN_t and uintptr_t.

ddb: Cast pointer to uintptr_t, then to db_expr_t.
Avoids warnings about conversion between pointer and integer of
different size on some architectures.

re-fix hppa builds.

this file uses __cpu_simple_lock(), not just the underlying type,
so it does need machine/lock.h.

Break cycle by using `struct kmutex *' instead of `kmutex_t *'.
sys/sched.h included sys/mutex.h
which includes sys/intr.h
which includes machine/intr.h
which on cats includes arm/footbridge/footbridge_intr.h
which includes arm/cpu.h
which includes sys/cpu_data.h
which includes sys/sched.h

But there was never any real need for sys/mutex.h in sys/sched.h,
because it only uses pointers to the opaque struct kmutex. Cycle
broken by using `struct kmutex *' instead of pulling in sys/mutex.h
for the definition of kmutex_t.

Side effect: This revealed that sys/cpu_data.h needed sys/intr.h
(which was pulled in accidentally by sys/mutex.h via sys/sched.h) for
SOFTINT_COUNT. Also revealed some other machine/cpu.h header files
were missing includes of sys/mutex.h for kmutex_t.

ia64: Need sys/types.h for u_int, vaddr_t; sys/mutex.h for kmutex_t.

explicitly include no longer implicitly included sys/mutex.h.

arm/xscale: Use sys/bitops.h fls32 - 1 instead of 31 - __builtin_clz.
Sidesteps namespace collision with `#define bits ...' in net/zlib.c.

complete the previous - there were two calls to find_first_bit() to fix.

arm/xscale: Missed a spot with previous find_first_bit commit.

evbarm/g42xxeb: Fix off-by-one in previous.

The original find_first_bit(x) was 31 - __builtin_clz((uint32_t)x),
which is equivalent to fls32(x) - 1, not to fls32(x).

Note that fls32 is 1-based and returns 0 for x=0.
 1.134 10-Sep-2023  ad uvmpd_trylockowner(): release pg->interlock before calling rw_obj_free()
since it can call back into the VM system.
 1.133 17-Apr-2021  mrg branches: 1.133.16;
fix error in previous: UVMHIST_PDHIST_SIZE needs to stay next to pdhistbuf[].
 1.132 17-Apr-2021  mrg remove KERNHIST_INIT_STATIC(). it stradles the line between usable
early in boot and broken early in boot by requiring a partly static
structure with another structure that must be present by the time
any uses are performed. theoretically platform code could allocate
a chunk while seting up memory and assign it here, giving a dynamic
sizing for the entry list, but the reality is that all users have
a statically allocated entry list as well.

the existing KERNHIST_LINK_STATIC() is used in conjunction with
KERNHIST_INITIALIZER() instead.

this stops a NULL pointer deref when the _LOG() macro is called
before the storage is linked in, which happens with GCC 10 on OCTEON
with UVMHIST enabled, crashing in very early kernel init.
 1.131 04-Nov-2020  chs branches: 1.131.2;
In uvmpd_tryownerlock(), if the initial try-lock of the owner lock fails
then rather than do more try-locks and eventually sleep for a tick,
take a hold on the current owner's lock, drop the page interlock,
and acquire the lock that we took the hold on in a blocking fashion.
After we get the lock, check if the lock that we acquired is still
the lock for the owner of the page that we're interested in.
If the owner hasn't changed then can proceed with this page,
otherwise we will skip this page and move on to a different page.
This dramatically reduces the amount of time that the pagedaemon
sleeps trying to get locks, since even 1 tick is an eternity to sleep
in this context and it was easy to trigger that case in practice,
and with this new method the pagedaemon only very rarely actually blocks
to acquire the lock that it wants since the object locks are adaptive,
and when the pagedaemon does block then the amount of time it spends
sleeping will be generally be much less than 1 tick.
 1.130 09-Jul-2020  skrll branches: 1.130.2;
Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
 1.129 11-Jun-2020  ad Counter tweaks:

- Don't need to count anonpages+filepages any more; clean+unknown+dirty for
each kind of page can be summed to get the totals.

- Track the number of free pages with a counter so that it's one less thing
for the allocator to do, which opens up further options there.

- Remove cpu_count_sync_one(). It has no users and doesn't save a whole lot.
For the cheap option, give cpu_count_sync() a boolean parameter indicating
that a cached value is okay, and rate limit the updates for cached values
to hz.
 1.128 11-Jun-2020  ad uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
 1.127 25-May-2020  ad uvm_pageout_done(): do nothing when npages is zero.
 1.126 13-Apr-2020  maxv hardclock_ticks -> getticks()
 1.125 23-Feb-2020  ad branches: 1.125.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.124 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.123 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.122 31-Dec-2019  ad branches: 1.122.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.121 31-Dec-2019  ad Rename uvm_free() -> uvm_availmem().
 1.120 31-Dec-2019  ad Rename uvm_page_locked_p() -> uvm_page_owner_locked_p()
 1.119 30-Dec-2019  ad pagedaemon:

- Use marker pages to keep place in the queue when scanning, rather than
relying on assumptions.

- In uvmpdpol_balancequeue(), lock the object once instead of twice.

- When draining pools, the situation is getting desperate, but try to avoid
saturating the system with xcall, lock and interrupt activity by sleeping
for 1 clock tick if being continually awoken and all pools have been
cycled through at least once.

- Pause & resume the freelist cache during pool draining.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.118 21-Dec-2019  ad Fix merge error - don't init uvmpd_lock twice.
 1.117 21-Dec-2019  ad Detangle the pagedaemon from uvm_fpageqlock:

- Have a single lock (uvmpd_lock) to protect pagedaemon state that was
previously covered by uvmpd_pool_drain_lock plus uvm_fpageqlock.
- Don't require any locks be held when calling uvm_kick_pdaemon().
- Use uvm_free().
 1.116 21-Dec-2019  ad uvm_reclaimable(): need to sum the per-CPU values for filepages/execpages.
 1.115 14-Dec-2019  ad The uvmexp.pdpending change was incorrect - revert for now.
 1.114 14-Dec-2019  ad Adjust pdpending in uvm_pageout_start() and uvm_pageout_done() to avoid
the value going temporarily negative.
 1.113 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.112 01-Dec-2019  ad - Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.
 1.111 01-Oct-2019  chs in uvm_wait(), panic if the pagedaemon thread does not exist.
this avoids a hang if the system runs out of memory before
the mechanisms for reclaiming memory have been set up.
 1.110 21-Apr-2019  chs Draining pools from the pagedaemon thread can deadlock, because draining
a pool can involve taking a lock which can be held by a thread which is
blocked waiting for memory. Avoid this by moving the pool-draining work
to a separate worker thread.
 1.109 28-Oct-2017  pgoyette branches: 1.109.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.108 25-Oct-2013  martin branches: 1.108.22;
Mark a diagnostic-only variable
 1.107 30-Jul-2012  matt branches: 1.107.2; 1.107.4;
-fno-common broke kernhist since it used commons.
Add a KERNHIST_DEFINE which is define the kernel history.
Change UVM to deal with the new usage.
 1.106 05-Jun-2012  jym Now that pool_cache_invalidate() is synchronous and can handle per-CPU
caches, merge together pool_drain_start() and pool_drain_end() into

bool pool_drain(struct pool **ppp);

"bool" value indicates whether reclaiming was fully done (true) or not (false)
"ppp" will contain a pointer to the pool that was drained (optional).

See http://mail-index.netbsd.org/tech-kern/2012/06/04/msg013287.html
 1.105 01-Feb-2012  para allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space
 1.104 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.103 12-Jun-2011  rmind branches: 1.103.2; 1.103.6;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.102 02-Feb-2011  chuck branches: 1.102.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.
 1.101 02-Jun-2010  pooka branches: 1.101.2; 1.101.4;
it's a wonderful static
 1.100 21-Oct-2009  rmind branches: 1.100.2; 1.100.4;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.99 18-Aug-2009  yamt whitespace fixes. no functional changes.
 1.98 10-Aug-2009  haad Add uvm_reclaim_hooks support for reclaiming kernel KVA space and memory.
This is used only by zfs where uvm_reclaim hook is added from arc cache.

Oked ad@.
 1.97 13-Dec-2008  ad PR 40027/pagedaemon loops on memory shortage

uvmpd_scan_queue:

- Fix a bug that prevented the pagedaemon from making forward progress
if (a) swap was full (b) the first 16 pages on the inactive list were
unbusy anons not already backed by swap.

- Remove redundant uvm_swapisfull() check and just try to allocate a slot.
If it fails we know swap is full.
 1.96 03-Dec-2008  ad Make adjustment of uvm_extrapages atomic since it's done without a lock.
XXX This is still a hack.
 1.95 02-Dec-2008  ad uvmpd_tune: make the adjustments to individual variables atomic.
 1.94 14-Nov-2008  ad - If the system encounters a severe memory shortage, start unloading
unused kernel modules.
- Try to unload any autoloaded kernel modules 10 seconds after their
load was successful.
- Keep a counter to track module load/unload events.
 1.93 23-Sep-2008  ad branches: 1.93.2; 1.93.4;
- Make free target 0.5%, but limit to between 128k and 1024k.
- Scale free target by number of CPUs.
- Prefer pageing to swapping.

Proposed on tech-kern.
 1.92 29-Feb-2008  yamt branches: 1.92.4; 1.92.6; 1.92.10;
uvm_swap_io: if pagedaemon, don't wait for iobuf.
 1.91 07-Feb-2008  yamt branches: 1.91.2; 1.91.6;
swapcluster_flush: handle nused==0, which can happen if swapcluster_add failed.
PR/37669 from Andrew Doran.
 1.90 28-Jan-2008  yamt remove a special allocator for uareas, which is no longer necessary.
use pool_cache instead.
 1.89 02-Jan-2008  ad Merge vmlocking2 to head.
 1.88 07-Nov-2007  ad branches: 1.88.2; 1.88.6;
Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.87 21-Jul-2007  ad branches: 1.87.4; 1.87.6; 1.87.10; 1.87.12; 1.87.14;
Merge unobtrusive locking changes from the vmlocking branch.
 1.86 09-Jul-2007  ad branches: 1.86.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.85 15-Jun-2007  ad Add a sysctl to disable swapout of kernel stacks. Discussed on tech-kern@.
 1.84 22-Feb-2007  thorpej branches: 1.84.4; 1.84.6;
TRUE -> true, FALSE -> false
 1.83 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.82 27-Dec-2006  alc branches: 1.82.2;
CID-4192: ensure we have 'uobj != NULL` here

ok christos@ and yamt@
 1.81 21-Dec-2006  yamt merge yamt-splraiseipl branch.

- finish implementing splraiseipl (and makeiplcookie).
http://mail-index.NetBSD.org/tech-kern/2006/07/01/0000.html
- complete workqueue(9) and fix its ipl problem, which is reported
to cause audio skipping.
- fix netbt (at least compilation problems) for some ports.
- fix PR/33218.
 1.80 01-Nov-2006  yamt remove some __unused from function parameters.
 1.79 12-Oct-2006  yamt remove unnecessary #include of vnode.h.
 1.78 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.77 15-Sep-2006  yamt branches: 1.77.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.76 14-Feb-2006  yamt branches: 1.76.2; 1.76.14;
share some code between uvmpd_scan_inactive and uvmpd_scan.
 1.75 14-Feb-2006  yamt fix a compilation problem where PAGE_SHIFT is not a constant.
pointed by Chuck Silvers.
 1.74 13-Feb-2006  yamt remove an outdated comment.
 1.73 12-Feb-2006  yamt factor out swap clustering code.
 1.72 05-Jan-2006  yamt branches: 1.72.2; 1.72.4;
uvmpd_scan_inactive: when reactivating a page,
use pmap_is_referenced rather than pmap_clear_reference.
we don't need to clear the bit here as we'll do so when
moving pages back to inactive queue again. pointed by Chuck Silvers.
 1.71 21-Dec-2005  yamt branches: 1.71.2;
uvmpd_scan: when deactivating a page, clear its reference bit.
discussed on tech-kern@.
 1.70 21-Dec-2005  yamt make length of inactive queue tunable by sysctl. (vm.inactivepct)
 1.69 29-Nov-2005  yamt read-ahead statistics.
 1.68 13-Sep-2005  yamt branches: 1.68.6;
wrap swap related code by #ifdef VMSWAP. always #define VMSWAP for now.
 1.67 31-Jul-2005  yamt revert "defflag VMSWAP" changes for now.
there seems to be far more people who don't want to edit
their kernel config files than i thought.
 1.66 30-Jul-2005  yamt defflag VMSWAP.
 1.65 27-Jun-2005  thorpej branches: 1.65.2;
Use ANSI function decls.
 1.64 11-May-2005  yamt allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.63 04-May-2005  yamt uvm_reclaimable: add an XXX comment.
 1.62 12-Apr-2005  yamt fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.
 1.61 30-Jan-2005  chs hack around a UVM problem that causes hangs when large processes fork.
see PR 26908 for details.
 1.60 03-Oct-2004  enami branches: 1.60.4; 1.60.6;
- Don't let pagedaemon sleep while draining buf.
- Estimate amount of memory to free at a time.
Address PR#27057 (and similar hangs I saw several months ago).
 1.59 24-Mar-2004  junyoung branches: 1.59.2;
Nuke __P().
 1.58 30-Jan-2004  tls Buffer cache fixes to avoid thrashing between high and low water marks
and uncontrolled growth.

The key fix is from Dan Carasone, who noticed that buf_canfree() was
counting in _bytes_ but freeing in _buffers_, which caused the instant
drop to lowater observed by some users.

We now control the rate of growth; the probability of getting a new
allocation is inversely proportional to the current size of the
cache. This idea is from a long-ago conversation with Kirk McKusick
and, if memory serves, was used for the file-system cache in some
other BSD variant at some point in history.

With growth and shrinkage more or less dealt with, we return the
default maximum cache size to 15%. The default _minimum_ cache size
is raised from 1/16 of the maximum cache size to 1/8, since 1/16 was
chosen when the maximum size was 30% of memory.

Finally, after observing the behaviour of the pagedaemon and the
buffer cache drainer under pathological workloads (e.g. a benchmark
that steps through 75% of available memory backwards) I have moved
the call to buf_drain() to the beginning of the pagedaemon from the
end; if the pagedaemon bogs down, it still won't get run as often
as it should, but at least this way it will see the state of the
free count and free target _before_ the scan step does its thing.
 1.57 04-Jan-2004  jdolecek Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread
 1.56 30-Dec-2003  pk Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
 1.55 26-Sep-2003  chs don't dereference a vm_page pointer after we free the page.
 1.54 01-Sep-2003  yamt remove an obsolete comment.
(we now have only one inactive list.)
 1.53 28-Aug-2003  pk When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.
 1.52 11-Aug-2003  pk Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.
 1.51 23-Apr-2003  tls branches: 1.51.2;
Correct use of MAXBSIZE where MAXPHYS was intended. This is a necessary
first step towards per-device MAXPHYS, and has the beneficial side effect
of allowing clustering to MAXPHYS even on systems that need to run with
a reduced MAXBSIZE to get more metadata buffers.
 1.50 25-Feb-2003  simonb Cast result of pgo_put() to (void) as is the style with other calls to
pgo_put() in UVM.

Pointed out by Andrew Brown.
 1.49 23-Feb-2003  simonb Remove assigned-to but not used variable.
 1.48 24-Nov-2002  scw Quell uninitialised variable warnings.
 1.47 20-Jun-2002  chs count aobj pages (most notably kernel stack pages) as anon pages
for memory usage-balancing purposes.
 1.46 05-May-2002  chs branches: 1.46.2; 1.46.4;
look in the right flags field for PQ_INACTIVE.
make uvmpd_scan_inactive() return void since its return value is ignored.
 1.45 21-Jan-2002  wiz branches: 1.45.4;
deamon -> daemon
 1.44 31-Dec-2001  chs fix locking for loaning. in general we should be looking at the page's
uobject and uanon pointers rather than at the PQ_ANON flag to determine
which lock to hold, since PQ_ANON can be clear even when the anon's lock
is the one which we should hold (if the page was loaned from an object
and then freed by the object).
 1.43 09-Dec-2001  chs add {anon,file,exec}max as a upper bound on the amount of memory that
will be allocated for the respective usage types when there is contention
for memory.

replace "vnode" and "vtext" with "file" and "exec" in uvmexp field names
and sysctl names.
 1.42 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.41 06-Nov-2001  chs several changes prompted by loaning problems:
- fix the loaned case in uvm_pagefree().
- redo uvmexp.swpgonly accounting to work with page loaning.
add an assertion before each place we adjust uvmexp.swpgonly.
- fix uvm_km_pgremove() to always free any swap space associated with
the range being removed.
- get rid of UVM_LOAN_WIRED flag. instead, we just make sure that
pages loaned to the kernel are never on the page queues.
this allows us to assert that pages are not loaned and wired
at the same time.
- add yet more assertions.
 1.40 06-Nov-2001  simonb Remove some variables that are set but never used.
 1.39 30-Sep-2001  chs branches: 1.39.2;
skip the swap-out code if there's no swap space configured.
avoid some hangs in low-memory situations.
 1.38 26-Sep-2001  chs move call to pool_drain() outside the pageq lock.
 1.37 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.36 27-Jun-2001  thorpej branches: 1.36.2; 1.36.4;
G/c a comment that no longer applies.
 1.35 23-Jun-2001  chs don't for memory in uao_set_swlot() since we're holding spinlocks,
instead return -1. adjust callers to handle this new error return.
fixes PR 13194.
 1.34 25-May-2001  chs remove trailing whitespace.
 1.33 22-May-2001  ross Merge the swap-backed and object-backed inactive lists.
 1.32 07-May-2001  thorpej Fix a silly mistake I made when reworking the uvm inactive list
some time ago. The mistake was to check that the page was not
referenced since the last active scan before moving it to inactive.
Now we just clear reference and move it to inacive (which is where
the second clock hand sweep occurs).
 1.31 10-Mar-2001  chs eliminate the VM_PAGER_* error codes in favor of the traditional E* codes.
the mapping is:

VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART

for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
 1.30 09-Mar-2001  chs add UBC memory-usage balancing. we track the number of pages in use for
each of the basic types (anonymous data, executable image, cached files)
and prevent the pagedaemon from reusing a given page if that would reduce
the count of that type of page below a sysctl-setable minimum threshold.
the thresholds are controlled via three new sysctl tunables:
vm.anonmin, vm.vnodemin, and vm.vtextmin. these tunables are the
percentages of pageable memory reserved for each usage, and we do not allow
the sum of the minimums to be more than 95% so that there's always some
memory that can be reused.
 1.29 28-Jan-2001  thorpej branches: 1.29.2;
Page scanner improvements, behavior is actually a bit more like
Mach VM's now. Specific changes:
- Pages now need not have all of their mappings removed before being
put on the inactive list. They only need to have the "referenced"
attribute cleared. This makes putting pages onto the inactive list
much more efficient. In order to eliminate redundant clearings of
"refrenced", callers of uvm_pagedeactivate() must now do this
themselves.
- When checking the "modified" attribute for a page (for clearing
PG_CLEAN), make sure to only do it if PG_CLEAN is currently set on
the page (saves a potentially expensive pmap operation).
- When scanning the inactive list, if a page is referenced, reactivate
it (this part was actually added in uvm_pdaemon.c,v 1.27). This
now works properly now that pages on the inactive list are allowed to
have mappings.
- When scanning the inactive list and considering a page for freeing,
remove all mappings, and then check the "modified" attribute if the
page is marked PG_CLEAN.
- When scanning the active list, if the page was referenced since its
last sweep by the scanner, don't deactivate it. (This part was
actually added in uvm_pdaemon.c,v 1.28.)

These changes greatly improve interactive performance during
moderate to high memory and I/O load.
 1.28 25-Jan-2001  thorpej When considering a page for deactivation, check to see if the
page has been referenced since the last time it was considered.
If it was, don't deactivate the page.
 1.27 25-Jan-2001  mycroft Put back the pmap_is_referenced() check from the original UVM code in the
inactive list scans. Without this, the referenced bit was essentially ignored.
 1.26 13-Dec-2000  chs continue processing the inactive queue past the free target when
we're enforcing the limit on the number of vnode pages.
 1.25 30-Nov-2000  simonb Move uvm_pgcnt_vnode and uvm_pgcnt_anon into uvmexp (as vnodepages and
anonpages), and add vtextpages which is currently unused but will be
used to trace the number of pages used by vtext vnodes.
 1.24 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.23 20-Aug-2000  bjh21 Ensure that uvmexp.freemin is above the kernel reserved-page count.

When it wasn't (which could happen on a 4Mb machine with 32kb pages),
uvm_pagealloc_strat could refuse to allocate user memory, while the pagedaemon
didn't think it was worth freeing any more, resulting in the system seizing up.
 1.22 12-Aug-2000  thorpej Don't bother with a trampoline to start the pagedaemon and
reaper threads.
 1.21 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.20 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.19 04-Nov-1999  thorpej Const poison uvm_wait().
 1.18 12-Sep-1999  chs branches: 1.18.2; 1.18.4; 1.18.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.
 1.17 22-Jul-1999  thorpej Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.16 24-May-1999  thorpej - Change uvm_{lock,unlock}_fpageq() to return/take the previous interrupt
level directly, instead of making the caller wrap the calls in
splimp()/splx().
- Add a comment documenting that interrupts that cause memory allocation
must be blocked while the free page queue is locked.

Since interrupts must be blocked while this lock is asserted, tying them
together like this helps to prevent mistakes.
 1.15 30-Mar-1999  mycroft branches: 1.15.4;
Adjust a comparison so that the pagedaemon doesn't get stuck ping-ponging with
a process trying to allocate memory.
 1.14 26-Mar-1999  chs add uvmexp.swpgonly and use it to detect out-of-swap conditions.

numerous pagedaemon improvements were needed to make this useful:
- don't bother waking up procs waiting for memory if there's none to be had.
- start 4 times as many pageouts as we need free pages.
this should reduce latency in low-memory situations.
- in inactive scanning, if we find dirty swap-backed pages when swap space
is full of non-resident pages, reactivate some number of these to flush
less active pages to the inactive queue so we can consider paging them out.
this replaces the previous scheme of inactivating pages beyond the
inactive target when we failed to free anything during inactive scanning.
- during both active and inactive scanning, free any swap resources from
dirty swap-backed pages if swap space is full. this allows other pages
be paged out into that swap space.
 1.13 25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.12 04-Nov-1998  chs branches: 1.12.2;
remove outdated comment.
 1.11 18-Oct-1998  chs shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.
 1.10 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.9 23-Jul-1998  pk branches: 1.9.2;
Include pool_drain() in page scans.
 1.8 09-Mar-1998  mrg KNF.
 1.7 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.6 09-Feb-1998  mrg keep statistics on pageout/pagein, total pages, and total operations.
 1.5 07-Feb-1998  mrg implement counters for pages paged in/out
 1.4 07-Feb-1998  mrg restore rcsids
 1.3 07-Feb-1998  chs keep track of how many pages are currently being paged out,
stop initiating new pageouts when "(free + paging) > freetarg".
fix pageq locking.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.9.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.12.2.5 30-May-1999  chs when processing aiodones, only adjust uvmexp.paging if the aio
was flagged as being started by the pagedaemon.
 1.12.2.4 29-Apr-1999  chs remove a mistaken simple_unlock().
 1.12.2.3 09-Apr-1999  chs split aiodone handling out from the pagedaemon into its own thread,
the "aiodone daemon". the aiodone daemon never allocates memory,
so the pagedaemon will be able to safely block waiting for memory
as long as there are some pageouts in progress. the paging queue
scheme needs to change before this is done tho.
 1.12.2.2 25-Feb-1999  chs treat pages being paged out as "free" when determining whether to
scan the page queues.
in uvmpd_scan_inactive(), keep initating pageouts until we'll have
4 times the number of pages clean as we want free.
(this fudge factor may need adjustment).
move adjustment of uvmexp.paging to uvm_pager_put(), which is a mistake.
I think the issue was that uvm_pager_put() might fail the pageout
and retry internally with just one page, so the pagedaemon has no way
to tell how many pages are actually being cleaned. this needs more thought.
in uvmpd_scan(), put back the business where we deactivate pages
beyond uvmexp.inactarg (there's a big comment explaining this).
rename some variables for clarity.
use TAILQ_* macros instead of poking the structs directly.
 1.12.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.15.4.4 31-Jul-1999  chs have the aiodone daemon wakeup the pagedaemon if there are still not
enough free pages after processing everything.
 1.15.4.3 04-Jul-1999  chs update for uvm.aio_done being struct buf instead of struct uvm_aiodesc.
pull in a fix from -current.
 1.15.4.2 21-Jun-1999  thorpej Sync w/ -current.
 1.15.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.18.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.18.4.1 15-Nov-1999  fvdl Sync with -current
 1.18.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.18.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.18.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.18.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.18.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.29.2.14 11-Dec-2002  thorpej Sync with HEAD.
 1.29.2.13 01-Aug-2002  nathanw Catch up to -current.
 1.29.2.12 16-Jul-2002  nathanw pagedaemon_proc really should be a proc, not a LWP.
 1.29.2.11 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.29.2.10 20-Jun-2002  nathanw Catch up to -current.
 1.29.2.9 28-Feb-2002  nathanw Catch up to -current.
 1.29.2.8 08-Jan-2002  nathanw Catch up to -current.
 1.29.2.7 14-Nov-2001  nathanw Catch up to -current.
 1.29.2.6 08-Oct-2001  nathanw Catch up to -current.
 1.29.2.5 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.29.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.29.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.29.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.29.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.36.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.36.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.36.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.36.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.36.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.39.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.45.4.2 12-Mar-2002  thorpej Convert the fpageqlock to a spin mutex at IPL_VM and rename it
to fpageq_mutex.
 1.45.4.1 11-Mar-2002  thorpej Convert swap_syscall_lock and uvm.swap_data_lock to adaptive mutexes,
and rename them apporpriately.
 1.46.4.2 26-Aug-2003  tron Pull up revision 1.51 (requested by tls in ticket #1434):
Correct use of MAXBSIZE where MAXPHYS was intended. This is a necessary
first step towards per-device MAXPHYS, and has the beneficial side effect
of allowing clustering to MAXPHYS even on systems that need to run with
a reduced MAXBSIZE to get more metadata buffers.
 1.46.4.1 21-Jun-2002  lukem Pull up revision 1.47 (requested by chs in ticket #329):
count aobj pages (most notably kernel stack pages) as anon pages
for memory usage-balancing purposes.
 1.46.2.1 15-Jul-2002  gehenna catch up with -current.
 1.51.2.7 11-Dec-2005  christos Sync with head.
 1.51.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.51.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.51.2.4 19-Oct-2004  skrll Sync with HEAD
 1.51.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.51.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.51.2.1 03-Aug-2004  skrll Sync with HEAD
 1.59.2.2 16-Mar-2005  tron Pull up revision 1.61 (requested by chs in ticket #1137):
hack around a UVM problem that causes hangs when large processes fork.
see PR 26908 for details.
 1.59.2.1 08-Oct-2004  jmc branches: 1.59.2.1.2;
Pullup rev 1.60 (requested by simonb in ticket #908)

- Dont let pagedaemon sleep while draining buf.
- Estimate amount of memory to free at a time.
- Factor out code to set watermark and ensure high > low.
- Make the step of allocation possibility a bit seamless by moving the origin
of curve from 0 to lowater mark.
Improves interactive performance when there is heavy disk activity.
PR#27057
 1.59.2.1.2.1 16-Mar-2005  tron Pull up revision 1.61 (requested by chs in ticket #1137):
hack around a UVM problem that causes hangs when large processes fork.
see PR 26908 for details.
 1.60.6.1 12-Feb-2005  yamt sync with head.
 1.60.4.1 29-Apr-2005  kent sync with -current
 1.65.2.9 17-Mar-2008  yamt sync with head.
 1.65.2.8 11-Feb-2008  yamt sync with head.
 1.65.2.7 04-Feb-2008  yamt sync with head.
 1.65.2.6 21-Jan-2008  yamt sync with head
 1.65.2.5 15-Nov-2007  yamt sync with head.
 1.65.2.4 03-Sep-2007  yamt sync with head.
 1.65.2.3 26-Feb-2007  yamt sync with head.
 1.65.2.2 30-Dec-2006  yamt sync with head.
 1.65.2.1 21-Jun-2006  yamt sync with head.
 1.68.6.1 29-Nov-2005  yamt sync with head.
 1.71.2.2 18-Feb-2006  yamt sync with head.
 1.71.2.1 15-Jan-2006  yamt sync with head.
 1.72.4.1 22-Apr-2006  simonb Sync with head.
 1.72.2.1 09-Sep-2006  rpaulo sync with head
 1.76.14.2 12-Jan-2007  ad Sync with head.
 1.76.14.1 18-Nov-2006  ad Sync with head.
 1.76.2.3 15-Sep-2006  yamt make UVM_KICK_PDAEMON() a real function and stop including
uvm_pdpolicy.h from uvm.h. this also fixes build of pmap(1).
 1.76.2.2 12-Mar-2006  yamt - change the way to account read-ahead stats.
- fix UVM_PQFLAGBITS.
 1.76.2.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.77.2.3 10-Dec-2006  yamt sync with head.
 1.77.2.2 22-Oct-2006  yamt use workqueue for aiodoned.
 1.77.2.1 22-Oct-2006  yamt sync with head
 1.82.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.84.6.1 11-Jul-2007  mjf Sync with head.
 1.84.4.11 01-Nov-2007  ad Yielding to avoid livelock doesn't work well, so just sleep for 1 tick.
This too is inadequate and a better solution must be found. Discussed
with yamt@.
 1.84.4.10 27-Oct-2007  yamt uvmpd_scan_queue: avoid too long busy-loops.
 1.84.4.9 26-Oct-2007  ad - Use a cross call to drain the per-CPU component of pool caches.
- When draining, skip over pools that are completly inactive.
 1.84.4.8 27-Aug-2007  yamt fix an uninitialized variable.
 1.84.4.7 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.84.4.6 22-Aug-2007  yamt update a comment.
 1.84.4.5 21-Aug-2007  yamt fix some races around pagedaemon and uvm_wait. ok'ed by Andrew Doran.
 1.84.4.4 20-Aug-2007  ad Sync with HEAD.
 1.84.4.3 15-Jul-2007  ad Sync with head.
 1.84.4.2 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.84.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.86.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.87.14.2 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.87.14.1 21-Jul-2007  ad file uvm_pdaemon.c was added on branch matt-mips64 on 2007-07-21 19:21:56 +0000
 1.87.12.2 18-Feb-2008  mjf Sync with HEAD.
 1.87.12.1 19-Nov-2007  mjf Sync with HEAD.
 1.87.10.1 13-Nov-2007  bouyer Sync with HEAD
 1.87.6.3 23-Mar-2008  matt sync with HEAD
 1.87.6.2 09-Jan-2008  matt sync with HEAD
 1.87.6.1 08-Nov-2007  matt sync with -HEAD
 1.87.4.1 11-Nov-2007  joerg Sync with HEAD.
 1.88.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.88.2.2 04-Dec-2007  ad Fix merge botch.
 1.88.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.91.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.91.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.91.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.91.2.1 24-Mar-2008  keiichi sync with head.
 1.92.10.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.92.10.1 19-Oct-2008  haad Sync with HEAD.
 1.92.6.1 10-Oct-2008  skrll Sync with HEAD.
 1.92.4.4 11-Aug-2010  yamt sync with head.
 1.92.4.3 11-Mar-2010  yamt sync with head
 1.92.4.2 19-Aug-2009  yamt sync with head.
 1.92.4.1 04-May-2009  yamt sync with head.
 1.93.4.2 02-Feb-2009  snj branches: 1.93.4.2.4;
Apply patch (requested by ad in ticket #357):
Make adjustment of some critical variables atomic.
 1.93.4.1 27-Dec-2008  snj Pull up following revision(s) (requested by bouyer in ticket #211):
sys/uvm/uvm_pdaemon.c: revision 1.97
PR 40027/pagedaemon loops on memory shortage
uvmpd_scan_queue:
- Fix a bug that prevented the pagedaemon from making forward progress
if (a) swap was full (b) the first 16 pages on the inactive list were
unbusy anons not already backed by swap.
- Remove redundant uvm_swapisfull() check and just try to allocate a slot.
If it fails we know swap is full.
 1.93.4.2.4.15 07-May-2012  matt Fix free wakeup
 1.93.4.2.4.14 27-Apr-2012  matt Don't decrement pgrp_active in radioactive page dequeue since we don't
increment it when activated a radioactive page.
 1.93.4.2.4.13 17-Apr-2012  matt Don't kick off the page daemon if it's not going to be able to do anything.
 1.93.4.2.4.12 14-Apr-2012  matt If the pagedaemon is stalling, don't wake it. Unless pages were freed for
a group, don't wake things up if paging is 0 (stop spurious wakeups).
 1.93.4.2.4.11 13-Apr-2012  matt Make sure color passed to uvm_reclaimable is valid.
 1.93.4.2.4.10 12-Apr-2012  matt If after the pagedaemon is woken and it processes the queues and make no
progress (frees no pages), instead of immediately trying again, wait 2 seconds.
 1.93.4.2.4.9 12-Apr-2012  matt Separate object-less anon pages out of the active list if there is no swap
device. Make uvm_reclaimable and uvm.*estimatable understand colors and
kmem allocations.
 1.93.4.2.4.8 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.93.4.2.4.7 17-Feb-2012  matt Change way waiters are handled.
 1.93.4.2.4.6 16-Feb-2012  matt Track the victims selected by the pagedaemon and what happens to then.
Keep a hint for what page group has the most free pages for a given color.
 1.93.4.2.4.5 14-Feb-2012  matt Add more KASSERTs (more! more! more!).
When returning page to the free pool, make sure to dequeue the pages before
hand or free page queue corruption will happen.
 1.93.4.2.4.4 13-Feb-2012  matt Use separate pending and paging tailq entries.
Add a queue check routine to validate the queues aren't corrupt.
 1.93.4.2.4.3 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.93.4.2.4.2 03-Jun-2011  matt Restore $NetBSD$
 1.93.4.2.4.1 03-Jun-2011  matt Rework page free lists to be sorted by color first rather than free_list.
Kept per color PGFL_* counter in each page free list.
Minor cleanups.
 1.93.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.100.4.4 05-Mar-2011  rmind sync with head
 1.100.4.3 03-Jul-2010  rmind sync with head
 1.100.4.2 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.100.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.100.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.101.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.101.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.102.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.103.6.1 18-Feb-2012  mrg merge to -current.
 1.103.2.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.103.2.5 30-Oct-2012  yamt sync with head
 1.103.2.4 17-Apr-2012  yamt sync with head
 1.103.2.3 26-Dec-2011  yamt - use O->A loan to serve read(2). based on a patch from Chuck Silvers
- associated O->A loan fixes.
 1.103.2.2 18-Nov-2011  yamt - use mutex obj for pageable object
- add a function to wait for a mutex obj being available
- replace some "livelock" kpauses with it
 1.103.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.107.4.1 18-May-2014  rmind sync with head
 1.107.2.2 03-Dec-2017  jdolecek update from HEAD
 1.107.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.108.22.2 22-Apr-2019  martin Pull up following revision(s) (requested by chs in ticket #1238):

sys/uvm/uvm_pdaemon.c: revision 1.110

Draining pools from the pagedaemon thread can deadlock, because draining
a pool can involve taking a lock which can be held by a thread which is
blocked waiting for memory. Avoid this by moving the pool-draining work
to a separate worker thread.
 1.108.22.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.109.4.3 21-Apr-2020  martin Sync with HEAD
 1.109.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.109.4.1 10-Jun-2019  christos Sync with HEAD
 1.122.2.2 29-Feb-2020  ad Sync with head.
 1.122.2.1 17-Jan-2020  ad Sync with head.
 1.125.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.130.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.131.2.1 17-Apr-2021  thorpej Sync with HEAD.
 1.133.16.1 02-Oct-2023  martin Pull up following revision(s) (requested by ad in ticket #379):

sys/uvm/uvm_pdaemon.c: revision 1.134

uvmpd_trylockowner(): release pg->interlock before calling rw_obj_free()
since it can call back into the VM system.
 1.20 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.19 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.18 30-Dec-2019  ad branches: 1.18.2;
pagedaemon:

- Use marker pages to keep place in the queue when scanning, rather than
relying on assumptions.

- In uvmpdpol_balancequeue(), lock the object once instead of twice.

- When draining pools, the situation is getting desperate, but try to avoid
saturating the system with xcall, lock and interrupt activity by sleeping
for 1 clock tick if being continually awoken and all pools have been
cycled through at least once.

- Pause & resume the freelist cache during pool draining.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.17 02-Feb-2011  chuck branches: 1.17.56;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.
 1.16 07-Sep-2010  pooka branches: 1.16.2; 1.16.4;
Make "no options VMSWAP" kernels compile again.
 1.15 02-Jan-2008  ad branches: 1.15.10; 1.15.28; 1.15.30; 1.15.32;
Merge vmlocking2 to head.
 1.14 21-Feb-2007  thorpej branches: 1.14.4; 1.14.18; 1.14.24; 1.14.26; 1.14.30;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.13 15-Sep-2006  yamt branches: 1.13.6;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.12 11-Dec-2005  christos branches: 1.12.8; 1.12.20;
merge ktrace-lwp.
 1.11 12-Apr-2005  yamt branches: 1.11.2;
fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.
 1.10 24-Mar-2004  junyoung branches: 1.10.8;
Nuke __P().
 1.9 25-May-2001  chs branches: 1.9.22;
remove trailing whitespace.
 1.8 04-Nov-1999  thorpej branches: 1.8.6;
Const poison uvm_wait().
 1.7 21-Jun-1999  thorpej branches: 1.7.2; 1.7.4; 1.7.8;
Protect prototypes, certain macros, and inlines from userland.
 1.6 25-Mar-1999  mrg branches: 1.6.4;
remove now >1 year old pre-release message.
 1.5 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.6.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.7.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.7.4.1 15-Nov-1999  fvdl Sync with -current
 1.7.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.6.1 21-Jun-2001  nathanw Catch up to -current.
 1.9.22.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.9.22.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.9.22.2 18-Sep-2004  skrll Sync with HEAD.
 1.9.22.1 03-Aug-2004  skrll Sync with HEAD
 1.10.8.1 29-Apr-2005  kent sync with -current
 1.11.2.3 21-Jan-2008  yamt sync with head
 1.11.2.2 26-Feb-2007  yamt sync with head.
 1.11.2.1 30-Dec-2006  yamt sync with head.
 1.12.20.1 18-Nov-2006  ad Sync with head.
 1.12.8.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.13.6.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.14.30.1 02-Jan-2008  bouyer Sync with HEAD
 1.14.26.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.14.24.1 18-Feb-2008  mjf Sync with HEAD.
 1.14.18.1 09-Jan-2008  matt sync with HEAD
 1.14.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.15.32.1 05-Mar-2011  rmind sync with head
 1.15.30.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.15.28.1 12-Apr-2012  matt Separate object-less anon pages out of the active list if there is no swap
device. Make uvm_reclaimable and uvm.*estimatable understand colors and
kmem allocations.
 1.15.10.1 09-Oct-2010  yamt sync with head
 1.16.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.16.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.17.56.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.18.2.1 29-Feb-2020  ad Sync with head.
 1.9 20-Aug-2022  riastradh uvm_pdpolicy.h: Fix missing forward declarations and includes.
 1.8 17-May-2020  ad Start trying to reduce cache misses on vm_page during fault processing.

- Make PGO_LOCKED getpages imply PGO_NOBUSY and remove the latter. Mark
pages busy only when there's actually I/O to do.

- When doing COW on a uvm_object, don't mess with neighbouring pages. In
all likelyhood they're already entered.

- Don't mess with neighbouring VAs that have existing mappings as replacing
those mappings with same can be quite costly.

- Don't enqueue pages for neighbour faults unless not enqueued already, and
don't activate centre pages unless uvmpdpol says its useful.

Also:

- Make PGO_LOCKED getpages on UAOs work more like vnodes: do gang lookup in
the radix tree, and don't allocate new pages.

- Fix many assertion failures around faults/loans with tmpfs.
 1.7 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.6 31-Dec-2019  ad branches: 1.6.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.5 30-Dec-2019  ad pagedaemon:

- Use marker pages to keep place in the queue when scanning, rather than
relying on assumptions.

- In uvmpdpol_balancequeue(), lock the object once instead of twice.

- When draining pools, the situation is getting desperate, but try to avoid
saturating the system with xcall, lock and interrupt activity by sleeping
for 1 clock tick if being continually awoken and all pools have been
cycled through at least once.

- Pause & resume the freelist cache during pool draining.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.4 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.3 21-Feb-2007  thorpej branches: 1.3.62; 1.3.132;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.2 15-Sep-2006  yamt branches: 1.2.6; 1.2.8;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.1 05-Mar-2006  yamt branches: 1.1.2; 1.1.6;
file uvm_pdpolicy.h was initially added on branch yamt-pdpolicy.
 1.1.6.1 18-Nov-2006  ad Sync with head.
 1.1.2.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.2.8.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.2.6.3 26-Feb-2007  yamt sync with head.
 1.2.6.2 30-Dec-2006  yamt sync with head.
 1.2.6.1 15-Sep-2006  yamt file uvm_pdpolicy.h was added on branch yamt-lazymbuf on 2006-12-30 20:51:05 +0000
 1.3.132.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.3.62.2 12-Apr-2012  matt Separate object-less anon pages out of the active list if there is no swap
device. Make uvm_reclaimable and uvm.*estimatable understand colors and
kmem allocations.
 1.3.62.1 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.6.2.1 29-Feb-2020  ad Sync with head.
 1.42 20-May-2025  bouyer Remove the redundant kpreempt_disable/kpreempt_enable now that we're
running at splsoftbio. Pointed out by thorpej@
 1.41 19-May-2025  bouyer uvmpdpol_pagerealize(): ucpu->pdqhead is used by a single CPU; but
kpreempt_disable() isn't enough to guard against concurent access;
interrupts also need to be disabled.
If my analysis is correct, the only place using ucpu->pdqhead which
can be called from interrupt context it uvmpdpol_pagerealize(), and only
from softbio().
So:
- introduce splsoftbio() in sys/spl.h
- protect all accesses to ucpu->pdqhead with splsoftbio()

fixes pr kern/59412: uvmpdpol_pagerealize() queue index out of bound
 1.40 12-Apr-2022  andvar branches: 1.40.4; 1.40.10;
s/stablize/stabilize/
 1.39 11-Jun-2020  ad Counter tweaks:

- Don't need to count anonpages+filepages any more; clean+unknown+dirty for
each kind of page can be summed to get the totals.

- Track the number of free pages with a counter so that it's one less thing
for the allocator to do, which opens up further options there.

- Remove cpu_count_sync_one(). It has no users and doesn't save a whole lot.
For the cheap option, give cpu_count_sync() a boolean parameter indicating
that a cached value is okay, and rate limit the updates for cached values
to hz.
 1.38 11-Jun-2020  ad uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
 1.37 17-May-2020  ad Start trying to reduce cache misses on vm_page during fault processing.

- Make PGO_LOCKED getpages imply PGO_NOBUSY and remove the latter. Mark
pages busy only when there's actually I/O to do.

- When doing COW on a uvm_object, don't mess with neighbouring pages. In
all likelyhood they're already entered.

- Don't mess with neighbouring VAs that have existing mappings as replacing
those mappings with same can be quite costly.

- Don't enqueue pages for neighbour faults unless not enqueued already, and
don't activate centre pages unless uvmpdpol says its useful.

Also:

- Make PGO_LOCKED getpages on UAOs work more like vnodes: do gang lookup in
the radix tree, and don't allocate new pages.

- Fix many assertion failures around faults/loans with tmpfs.
 1.36 02-Apr-2020  maxv Hide 'hardclock_ticks' behind a new getticks() function, and use relaxed
atomics internally. Only one caller is converted for now.

Discussed with riastradh@ and ad@.
 1.35 14-Mar-2020  ad uvm_pdpolicy: Require a write lock on the object only for dequeue.
No sense in requiring that for enqueue/activate/deactivate.
 1.34 08-Mar-2020  ad Don't zap the non-pdpolicy bits in pg->pqflags.
 1.33 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.32 30-Jan-2020  ad uvmpdpol_estimatepageable(): Don't take any locks here. This can be called
from DDB, and in any case the numbers are stale the instant the lock is
dropped, so it just doesn't matter.
 1.31 21-Jan-2020  ad uvmpdpol_pageactive(): the change to not re-activate recently activated
pages worked great with uvm_pageqlock, but it doesn't buy anything any more,
because now the busy pages are likely in a per-CPU queue somewhere waiting
to be processed, and changing the intent on those queued pages costs next
to nothing. Remove this and get back all the bits in pg->pqflags.
 1.30 01-Jan-2020  ad branches: 1.30.2;
Fix a comment.
 1.29 01-Jan-2020  mlelstv explicitely include sys/atomic.h for atomic operations.
 1.28 31-Dec-2019  ad - Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.27 31-Dec-2019  ad Rename uvm_free() -> uvm_availmem().
 1.26 31-Dec-2019  ad Rename uvm_page_locked_p() -> uvm_page_owner_locked_p()
 1.25 30-Dec-2019  ad Whitespace.
 1.24 30-Dec-2019  ad pagedaemon:

- Use marker pages to keep place in the queue when scanning, rather than
relying on assumptions.

- In uvmpdpol_balancequeue(), lock the object once instead of twice.

- When draining pools, the situation is getting desperate, but try to avoid
saturating the system with xcall, lock and interrupt activity by sleeping
for 1 clock tick if being continually awoken and all pools have been
cycled through at least once.

- Pause & resume the freelist cache during pool draining.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.23 27-Dec-2019  ad vm_page: Now that listq is gone, give the pagedaemon its own private
TAILQ_ENTRY, so that update of page replacement state can be made
asynchronous/lazy. No functional change.
 1.22 23-Dec-2019  ad uvmpdpol_selectvictim: don't assert wire_count == 0, as we can (safely)
race with object owner and wired pages can very briefly appear on the queue.
 1.21 21-Dec-2019  ad uvmexp.free -> uvm_free()
 1.20 16-Dec-2019  ad - Extend the per-CPU counters matt@ did to include all of the hot counters
in UVM, excluding uvmexp.free, which needs special treatment and will be
done with a separate commit. Cuts system time for a build by 20-25% on
a 48 CPU machine w/DIAGNOSTIC.

- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
 1.19 16-Dec-2019  ad Use the high bits of pqflags for PQ_TIME, not low.
 1.18 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.17 30-Jan-2012  para branches: 1.17.48;
removed code from uvmpdpol_needsscan_p that got there by mistake
pointed out by yamt@
 1.16 28-Jan-2012  rmind pool_page_alloc, pool_page_alloc_meta: avoid extra compare, use const.
ffs_mountfs,sys_swapctl: replace memset with kmem_zalloc.
sys_swapctl: move kmem_free outside the lock path.
uvm_init: fix comment, remove pointless numeration of steps.
uvm_map_enter: remove meflagval variable.
Fix some indentation.
 1.15 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.14 12-Jun-2011  rmind branches: 1.14.6;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.13 02-Feb-2011  chuck branches: 1.13.2;
udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.
 1.12 04-Jun-2008  ad branches: 1.12.16; 1.12.20; 1.12.26; 1.12.28;
vm_page: put TAILQ_ENTRY into a union with LIST_ENTRY, so we can use both.
 1.11 07-Mar-2008  martin branches: 1.11.2; 1.11.4; 1.11.6;
Swap sysctl -d description of vm.filemin and vm.execmin. Noted by
Raymond Meyer on current-users.
 1.10 18-Jan-2008  yamt branches: 1.10.2; 1.10.6;
push pmap_clear_reference calls into pdpolicy code, where reference bits
actually matter.
 1.9 02-Jan-2008  ad Merge vmlocking2 to head.
 1.8 22-Feb-2007  thorpej branches: 1.8.4; 1.8.18; 1.8.24; 1.8.26; 1.8.30;
TRUE -> true, FALSE -> false
 1.7 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.6 19-Jan-2007  skrll branches: 1.6.2;
Remove useless double assignment.

PR 35442
 1.5 01-Nov-2006  yamt branches: 1.5.4;
remove some __unused from function parameters.
 1.4 12-Oct-2006  yamt move some knowledge about vnode into uvm_vnode.c.
 1.3 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.2 15-Sep-2006  yamt branches: 1.2.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.1 05-Mar-2006  yamt branches: 1.1.2; 1.1.6;
file uvm_pdpolicy_clock.c was initially added on branch yamt-pdpolicy.
 1.1.6.2 01-Feb-2007  ad Sync with head.
 1.1.6.1 18-Nov-2006  ad Sync with head.
 1.1.2.2 15-Sep-2006  yamt make UVM_KICK_PDAEMON() a real function and stop including
uvm_pdpolicy.h from uvm.h. this also fixes build of pmap(1).
 1.1.2.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.2.2.2 10-Dec-2006  yamt sync with head.
 1.2.2.1 22-Oct-2006  yamt sync with head
 1.5.4.5 17-Mar-2008  yamt sync with head.
 1.5.4.4 21-Jan-2008  yamt sync with head
 1.5.4.3 26-Feb-2007  yamt sync with head.
 1.5.4.2 30-Dec-2006  yamt sync with head.
 1.5.4.1 01-Nov-2006  yamt file uvm_pdpolicy_clock.c was added on branch yamt-lazymbuf on 2006-12-30 20:51:05 +0000
 1.6.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.8.30.2 19-Jan-2008  bouyer Sync with HEAD
 1.8.30.1 02-Jan-2008  bouyer Sync with HEAD
 1.8.26.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.8.24.1 18-Feb-2008  mjf Sync with HEAD.
 1.8.18.2 23-Mar-2008  matt sync with HEAD
 1.8.18.1 09-Jan-2008  matt sync with HEAD
 1.8.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.10.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.10.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.10.2.1 24-Mar-2008  keiichi sync with head.
 1.11.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.11.4.1 04-May-2009  yamt sync with head.
 1.11.2.1 17-Jun-2008  yamt sync with head.
 1.12.28.1 08-Feb-2011  bouyer Sync with HEAD
 1.12.26.1 06-Jun-2011  jruoho Sync with HEAD.
 1.12.20.2 05-Mar-2011  rmind sync with head
 1.12.20.1 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.12.16.9 07-May-2012  matt Fix typo.
 1.12.16.8 27-Apr-2012  matt Don't decrement pgrp_active in radioactive page dequeue since we don't
increment it when activated a radioactive page.
 1.12.16.7 17-Apr-2012  matt If freemin is 0, don't say a scan is needed.
 1.12.16.6 12-Apr-2012  matt Use PQ_SWAPBACKED to determine radioactiveness of page.
Make sure to add in number of radioactive pages to actives pages.
 1.12.16.5 12-Apr-2012  matt Separate object-less anon pages out of the active list if there is no swap
device. Make uvm_reclaimable and uvm.*estimatable understand colors and
kmem allocations.
 1.12.16.4 17-Feb-2012  matt Assert the page isn't free before munging with its pageq.
 1.12.16.3 12-Feb-2012  matt Disable some of more agressive debug checks since with lots of pages, they
cause O(n^2) increases in time.
 1.12.16.2 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.12.16.1 21-Jan-2012  matt Use pg instead p as a pointer to struct uvm_page.
 1.13.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.14.6.1 18-Feb-2012  mrg merge to -current.
 1.17.48.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.30.2.2 29-Feb-2020  ad Sync with head.
 1.30.2.1 25-Jan-2020  ad Sync with head.
 1.40.10.1 02-Aug-2025  perseant Sync with HEAD
 1.40.4.1 28-May-2025  martin Pull up following revision(s) (requested by bouyer in ticket #1121):

sys/arch/ia64/include/intr.h: revision 1.9
sys/uvm/uvm_pdpolicy_clock.c: revision 1.41
sys/sys/spl.h: revision 1.11
sys/uvm/uvm_pdpolicy_clock.c: revision 1.42
sys/arch/sparc64/include/psl.h: revision 1.66

uvmpdpol_pagerealize(): ucpu->pdqhead is used by a single CPU; but

kpreempt_disable() isn't enough to guard against concurent access;
interrupts also need to be disabled.

If my analysis is correct, the only place using ucpu->pdqhead which
can be called from interrupt context it uvmpdpol_pagerealize(), and only
from softbio().

So:
- introduce splsoftbio() in sys/spl.h
- protect all accesses to ucpu->pdqhead with splsoftbio()
fixes pr kern/59412: uvmpdpol_pagerealize() queue index out of bound

Provide splsoftbio()

Remove the redundant kpreempt_disable/kpreempt_enable now that we're
running at splsoftbio. Pointed out by thorpej@
 1.27 12-Apr-2022  andvar s/stablize/stabilize/
 1.26 17-May-2020  ad Start trying to reduce cache misses on vm_page during fault processing.

- Make PGO_LOCKED getpages imply PGO_NOBUSY and remove the latter. Mark
pages busy only when there's actually I/O to do.

- When doing COW on a uvm_object, don't mess with neighbouring pages. In
all likelyhood they're already entered.

- Don't mess with neighbouring VAs that have existing mappings as replacing
those mappings with same can be quite costly.

- Don't enqueue pages for neighbour faults unless not enqueued already, and
don't activate centre pages unless uvmpdpol says its useful.

Also:

- Make PGO_LOCKED getpages on UAOs work more like vnodes: do gang lookup in
the radix tree, and don't allocate new pages.

- Fix many assertion failures around faults/loans with tmpfs.
 1.25 10-Apr-2020  tsutsui Update a link to "CLOCK-Pro" paper.
 1.24 14-Mar-2020  ad branches: 1.24.2;
uvm_pdpolicy: Require a write lock on the object only for dequeue.
No sense in requiring that for enqueue/activate/deactivate.
 1.23 30-Jan-2020  ad uvmpdpol_estimatepageable(): Don't take any locks here. This can be called
from DDB, and in any case the numbers are stale the instant the lock is
dropped, so it just doesn't matter.
 1.22 31-Dec-2019  ad branches: 1.22.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.21 31-Dec-2019  ad Rename uvm_page_locked_p() -> uvm_page_owner_locked_p()
 1.20 30-Dec-2019  ad pagedaemon:

- Use marker pages to keep place in the queue when scanning, rather than
relying on assumptions.

- In uvmpdpol_balancequeue(), lock the object once instead of twice.

- When draining pools, the situation is getting desperate, but try to avoid
saturating the system with xcall, lock and interrupt activity by sleeping
for 1 clock tick if being continually awoken and all pools have been
cycled through at least once.

- Pause & resume the freelist cache during pool draining.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.19 27-Dec-2019  ad vm_page: Now that listq is gone, give the pagedaemon its own private
TAILQ_ENTRY, so that update of page replacement state can be made
asynchronous/lazy. No functional change.
 1.18 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.17 20-Jun-2011  yamt branches: 1.17.54;
band-aid fix after the merge of rmind-uvmplock branch.
 1.16 05-Feb-2011  yamt branches: 1.16.2;
pageobj: remove a wrong assertion.
 1.15 04-Jun-2008  ad branches: 1.15.16; 1.15.20; 1.15.26; 1.15.28;
vm_page: put TAILQ_ENTRY into a union with LIST_ENTRY, so we can use both.
 1.14 22-Mar-2008  bjs branches: 1.14.2; 1.14.4; 1.14.6;
Allow this to compile if LISTQ is undefined:

- Put '#ifdef LISTQ' ... '#endif' pairs around pageq_insert_head()
and clockpro_insert_head().

- Add missing argument to printf statement.
 1.13 07-Feb-2008  yamt branches: 1.13.6;
nonresident_rotate: avoid too long loops which can happen on some workloads.
 1.12 18-Jan-2008  yamt push pmap_clear_reference calls into pdpolicy code, where reference bits
actually matter.
 1.11 13-Jan-2008  yamt nonresident_rotate: micro optimization
 1.10 02-Jan-2008  ad Merge vmlocking2 to head.
 1.9 01-Aug-2007  yamt branches: 1.9.4; 1.9.10; 1.9.12; 1.9.16; 1.9.20;
use separate nreslookup evcnt for obj and anon pages.
 1.8 22-Feb-2007  thorpej branches: 1.8.4; 1.8.12;
TRUE -> true, FALSE -> false
 1.7 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.6 28-Nov-2006  yamt branches: 1.6.4; 1.6.6;
uvmpdpol_pagedequeue: clear PQ_INITIALREF.
otherwise, dequeue/enqueue cycles (eg. page loaning) can cause
an assertion failure in clockpro_pageenqueue.
 1.5 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.4 12-Oct-2006  yamt remove unnecessary #include of vnode.h.
 1.3 09-Oct-2006  yamt fix some warnings in the case of PDSIM.
 1.2 15-Sep-2006  yamt branches: 1.2.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.1 06-Mar-2006  yamt branches: 1.1.2; 1.1.6;
file uvm_pdpolicy_clockpro.c was initially added on branch yamt-pdpolicy.
 1.1.6.2 12-Jan-2007  ad Sync with head.
 1.1.6.1 18-Nov-2006  ad Sync with head.
 1.1.2.11 15-Sep-2006  yamt make UVM_KICK_PDAEMON() a real function and stop including
uvm_pdpolicy.h from uvm.h. this also fixes build of pmap(1).
 1.1.2.10 24-Mar-2006  yamt get rid of bootstrap code from frequently called path.
 1.1.2.9 24-Mar-2006  yamt separate "nresrecord" statistics for obj and anon.
 1.1.2.8 21-Mar-2006  yamt add a sysctl knob to adjust cold target.
 1.1.2.7 18-Mar-2006  yamt reduce BUCKETSIZE to make sizeof(struct bucket) a power of two.
 1.1.2.6 10-Mar-2006  yamt reduce unnecessary c99'ism.
 1.1.2.5 08-Mar-2006  yamt some comments.
 1.1.2.4 08-Mar-2006  yamt add a statistic.
 1.1.2.3 08-Mar-2006  yamt remove unnecessary ";".
 1.1.2.2 07-Mar-2006  yamt simplify #ifdef a little.
 1.1.2.1 06-Mar-2006  yamt an experimental implementation of CLOCK-Pro.
 1.2.2.2 10-Dec-2006  yamt sync with head.
 1.2.2.1 22-Oct-2006  yamt sync with head
 1.6.6.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.6.4.7 24-Mar-2008  yamt sync with head.
 1.6.4.6 11-Feb-2008  yamt sync with head.
 1.6.4.5 21-Jan-2008  yamt sync with head
 1.6.4.4 03-Sep-2007  yamt sync with head.
 1.6.4.3 26-Feb-2007  yamt sync with head.
 1.6.4.2 30-Dec-2006  yamt sync with head.
 1.6.4.1 28-Nov-2006  yamt file uvm_pdpolicy_clockpro.c was added on branch yamt-lazymbuf on 2006-12-30 20:51:05 +0000
 1.8.12.1 15-Aug-2007  skrll Sync with HEAD.
 1.8.4.2 20-Aug-2007  ad Sync with HEAD.
 1.8.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.9.20.2 01-Aug-2007  yamt use separate nreslookup evcnt for obj and anon pages.
 1.9.20.1 01-Aug-2007  yamt file uvm_pdpolicy_clockpro.c was added on branch matt-mips64 on 2007-08-01 14:49:56 +0000
 1.9.16.2 19-Jan-2008  bouyer Sync with HEAD
 1.9.16.1 02-Jan-2008  bouyer Sync with HEAD
 1.9.12.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.9.10.1 18-Feb-2008  mjf Sync with HEAD.
 1.9.4.2 23-Mar-2008  matt sync with HEAD
 1.9.4.1 09-Jan-2008  matt sync with HEAD
 1.13.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.13.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.14.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.14.4.1 04-May-2009  yamt sync with head.
 1.14.2.1 17-Jun-2008  yamt sync with head.
 1.15.28.1 08-Feb-2011  bouyer Sync with HEAD
 1.15.26.1 06-Jun-2011  jruoho Sync with HEAD.
 1.15.20.1 05-Mar-2011  rmind sync with head
 1.15.16.1 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.16.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.17.54.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.17.54.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.22.2.1 29-Feb-2020  ad Sync with head.
 1.24.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.2 15-Sep-2006  yamt branches: 1.2.6;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.1 05-Mar-2006  yamt branches: 1.1.2; 1.1.6;
file uvm_pdpolicy_impl.h was initially added on branch yamt-pdpolicy.
 1.1.6.1 18-Nov-2006  ad Sync with head.
 1.1.2.2 10-Mar-2006  yamt a comment.
 1.1.2.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.2.6.2 30-Dec-2006  yamt sync with head.
 1.2.6.1 15-Sep-2006  yamt file uvm_pdpolicy_impl.h was added on branch yamt-lazymbuf on 2006-12-30 20:51:06 +0000
 1.6 18-Oct-2020  chs In the current code, CPU_COUNT_FREEPAGES counts pages in the global
freelists AND the per-CPU pgflcache free pages caches, and that is the
number of pages that the pagedaemon considers to be available.
However, most pages in the pgflcache per-CPU free page caches are NOT
actually available for any particular allocation, and thus allocating
a page can fail even though the pagedaemon thinks enough pages are
available. This change makes CPU_COUNT_FREEPAGES only count pages in
the global freelists and not pages in the pgflcache per-CPU free page
caches, thus better aligning the pagedaemon's view of how many pages
are available with the number of pages that can actually be allocated
by any particular request. This fixes a hang that Christos was hitting.
 1.5 14-Jun-2020  ad Remove PG_ZERO. It worked brilliantly on x86 machines from the mid-90s but
having spent an age experimenting with it over the last 6 months on various
machines and with different use cases it's always either break-even or a
slight net loss for me.
 1.4 30-Dec-2019  ad branches: 1.4.6;
Freelist cache: drain using a high-priority xcall and re-enable now that
the pagedaemon starvation problem should be fixed.
 1.3 29-Dec-2019  ad It looks like the freelist cache can starve the pagedaemon under certain
conditions, so temporarily disable it. Will revisit soon.
 1.2 27-Dec-2019  ad Fix a comment.
 1.1 27-Dec-2019  ad Redo the page allocator to perform better, especially on multi-core and
multi-socket systems. Proposed on tech-kern. While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
 1.4.6.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.4.6.1 30-Dec-2019  martin file uvm_pgflcache.c was added on branch phil-wifi on 2020-04-08 14:09:04 +0000
 1.1 27-Dec-2019  ad branches: 1.1.6;
Redo the page allocator to perform better, especially on multi-core and
multi-socket systems. Proposed on tech-kern. While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
 1.1.6.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.1.6.1 27-Dec-2019  martin file uvm_pgflcache.h was added on branch phil-wifi on 2020-04-08 14:09:04 +0000
 1.92 14-Jan-2024  tnn fix DEBUG build
 1.91 13-Jan-2024  tnn uvm: change type of uvm_physseg.start_hint from u_int to u_long

Avoids assertion failure in uvm_pglistalloc_s_ps() with large paddrs.
PR kern/57683.
 1.90 21-Dec-2021  skrll branches: 1.90.4;
Update uvm_pglistalloc_[cs]_ps to return EINVAL if [low, high] doesn't
match any memory.

Useful for bus_dmamem_alloc where a tag might not cover any memory.
This will be used in an update to ehci.

"looks good" from chuq@
 1.89 20-Dec-2021  skrll Slight code re-structure and wrap a long line. Interestingly this gives
the same binary before and after.
 1.88 26-Mar-2021  chs in uvm_pglistalloc_contig_aggressive(), avoid looking forward past
the end of the target range of the physseg.
fixes PR 56074.
 1.87 24-Mar-2021  skrll Trailing whitespace
 1.86 07-Oct-2020  chs branches: 1.86.2; 1.86.4;
Add a new, more aggressive allocator for uvm_pglistalloc() to allocate
contiguous physical pages, and try this new allocator if the existing
one fails. The existing contig allocator only tries to allocate pages
that are already free, which works fine shortly after boot but rarely
works after the system has been up for a while. The new allocator uses
the pagedaemon to evict pages from memory in the hope that this will
free up a range of pages that satisfies the constraits of the request.
This should help with things like plugging in a USB device, which often
fails for some USB controllers because they can't get contigous memory.
 1.85 14-Jun-2020  ad Remove PG_ZERO. It worked brilliantly on x86 machines from the mid-90s but
having spent an age experimenting with it over the last 6 months on various
machines and with different use cases it's always either break-even or a
slight net loss for me.
 1.84 11-Jun-2020  ad Counter tweaks:

- Don't need to count anonpages+filepages any more; clean+unknown+dirty for
each kind of page can be summed to get the totals.

- Track the number of free pages with a counter so that it's one less thing
for the allocator to do, which opens up further options there.

- Remove cpu_count_sync_one(). It has no users and doesn't save a whole lot.
For the cheap option, give cpu_count_sync() a boolean parameter indicating
that a cached value is okay, and rate limit the updates for cached values
to hz.
 1.83 11-Jun-2020  ad uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
 1.82 23-May-2020  ad uvm_pglistfree(): just use uvm_pagefree().
 1.81 01-Mar-2020  ad uvm_pglistalloc() / uvm_pglistfree() musn't be called from interrupt
context. Assert it.
 1.80 20-Feb-2020  rin Make this compile again with PGALLOC_VERBOSE.
 1.79 31-Dec-2019  ad branches: 1.79.2;
Rename uvm_free() -> uvm_availmem().
 1.78 27-Dec-2019  ad Redo the page allocator to perform better, especially on multi-core and
multi-socket systems. Proposed on tech-kern. While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
 1.77 21-Dec-2019  ad Detangle the pagedaemon from uvm_fpageqlock:

- Have a single lock (uvmpd_lock) to protect pagedaemon state that was
previously covered by uvmpd_pool_drain_lock plus uvm_fpageqlock.
- Don't require any locks be held when calling uvm_kick_pdaemon().
- Use uvm_free().
 1.76 21-Dec-2019  ad - Rename VM_PGCOLOR_BUCKET() to VM_PGCOLOR(). I want to reuse "bucket" for
something else soon and TBH it matches what this macro does better.

- Add inlines to set/get locator values in the unused lower bits of
pg->phys_addr. Begin by using it to cache the freelist index, because
computing it is expensive and that shows up during profiling. Discussed
on tech-kern.
 1.75 21-Dec-2019  ad uvmexp.free -> uvm_free()
 1.74 16-Dec-2019  ad - Extend the per-CPU counters matt@ did to include all of the hot counters
in UVM, excluding uvmexp.free, which needs special treatment and will be
done with a separate commit. Cuts system time for a build by 20-25% on
a 48 CPU machine w/DIAGNOSTIC.

- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
 1.73 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.72 13-Nov-2018  mrg branches: 1.72.4;
only warn once per call to uvm_pglistalloc_simple() if waiting.
 1.71 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.70 23-Dec-2016  skrll branches: 1.70.14; 1.70.16;
PRIxPHYSMEM -> PRIxPHYSSEG to fix the build
 1.69 23-Dec-2016  skrll Whitespace
 1.68 23-Dec-2016  cherry "Make NetBSD great again!"

Introduce uvm_hotplug(9) to the kernel.

Many thanks, in no particular order to:

TNF, for funding the project.

Chuck Silvers - for multiple API reviews and feedback.
Nick Hudson - for testing on multiple architectures and bugfix patches.
Everyone who helped with boot testing.

KeK (http://www.kek.org.in) for hosting the primary developers.
 1.67 26-Oct-2014  christos branches: 1.67.2; 1.67.4;
Define UVMDEBUG for expensive debugging operations. Idea from chuq.
 1.66 05-Sep-2014  matt Don't use C++ try keyword as a variable name.
 1.65 19-May-2014  riastradh Back out previous silliness -- on failure no pages are allocated.
 1.64 19-May-2014  riastradh Don't leak memory on failure in uvm_pglistalloc_contig.

Free pages like uvm_pglistalloc_simple does.

Discovered by code inspection.
 1.63 15-Sep-2013  martin branches: 1.63.2;
Mark potentialy unused variables
 1.62 27-Sep-2011  jym branches: 1.62.2; 1.62.12; 1.62.16;
Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html
 1.61 23-Apr-2011  rmind Replace "malloc" in comments, remove unnecessary header inclusions.
 1.60 26-Jan-2011  enami Introducing inner loop prevent us from exiting from the original loop.
 1.59 25-Jan-2011  matt When starting the second pass, don't continue the for loop but instead
just test try exceeding limit.
 1.58 24-Jan-2011  matt Use the (new) KDASSERTMSG
 1.57 24-Jan-2011  matt Fix start_hint in "simple" alloc (fencepost error).
When restarting the loop, make sure end is not above current limit.
Do a quick test to see if the physseg is within the range of desired addresses.
 1.56 23-Jan-2011  he DEBUG does not imply DIAGNOSTIC; make sure we have a non-null
KASSERTMSG implementation (DIAGNOSTIC) so that the variable inside
the DEBUG section gets used.
 1.55 22-Jan-2011  matt Fix the corruption of ps->start_hint.
 1.54 21-Jan-2011  matt Cleanup/add some asserts. no functional change.
 1.53 21-Jan-2011  cegger buildfix: use PRIxPADDR for type paddr_t
 1.52 18-Jan-2011  matt branches: 1.52.2;
Improve the efficiency of searching for a contiguous set of free pages.
 1.51 25-Nov-2010  uebayasi branches: 1.51.2;
Revert vm_physseg allocation changes. A report says that it causes
panics when used with mplayer in heavy load.
 1.50 18-Nov-2010  cegger build fix: vm_physmem_index is only used with DEBUG.
Fix build when DIAGNOSTIC is enabled but not DEBUG
 1.49 18-Nov-2010  uebayasi Optimize DIAGNOSTIC check code.
 1.48 18-Nov-2010  uebayasi Fix DIAGNOSTIC physseg find check.
 1.47 14-Nov-2010  uebayasi Be a little more friendly to dynamic physical segment registration.

Maintain an array of pointer to struct vm_physseg, instead of struct
array. So that VM subsystem can take its pointer safely. Pointer
to this struct will replace raw paddr_t usage in the future.

Dynamic removal is not supported yet.

Only MD data structure changes, no kernel bump needed.

Tested on i386, amd64, powerpc/ibm40x, arm11.
 1.46 17-Jun-2010  mrg disable some DEBUG code uvm_pglist_add() that has severe performance
problems with large mappings. i've seen my system hang for a total
of 45 seconds when radeondrm is opened by X11, and it is the checks
in this function that take so long.
 1.45 10-Mar-2009  nonaka branches: 1.45.2; 1.45.4;
remove "#define PGALLOC_VERBOSE".
 1.44 09-Mar-2009  reinoud For this physical address printing use uintmax_t since on Xen PAE this length
(64) is not the same as the base architecture (32).
 1.43 09-Mar-2009  nonaka fix compile failure when PGALLOC_VERBOSE is defined.
 1.42 04-Jun-2008  ad branches: 1.42.6; 1.42.12; 1.42.16;
- vm_page: put listq, pageq into a union alongside a LIST_ENTRY, so we can
use both types of list.

- Make page coloring and idle zero state per-CPU.

- Maintain per-CPU page freelists. When freeing, put pages onto the local
CPU's lists and the global lists. When allocating, prefer to take pages
from the local CPU. If none are available take from the global list as
done now. Proposed on tech-kern@.
 1.41 02-Jun-2008  ad UVM_PAGEZERO_TARGET -> UVM_PAGEZERO_LOWAT
 1.40 28-Apr-2008  martin branches: 1.40.2;
Remove clause 3 and 4 from TNF licenses
 1.39 27-Feb-2008  ad branches: 1.39.2; 1.39.4;
Assert uvm_fpageqlock is held in a few more places.
 1.38 21-Jul-2007  ad branches: 1.38.6; 1.38.22; 1.38.26; 1.38.28;
Merge unobtrusive locking changes from the vmlocking branch.
 1.37 21-Feb-2007  thorpej branches: 1.37.4; 1.37.12;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.36 15-Sep-2006  yamt branches: 1.36.6;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.35 14-May-2006  christos branches: 1.35.8;
XXX: GCC uninitialized.
 1.34 11-Dec-2005  christos branches: 1.34.4; 1.34.6; 1.34.8; 1.34.12;
merge ktrace-lwp.
 1.33 27-Jun-2005  thorpej branches: 1.33.2;
Use ANSI function decls.
 1.32 17-Sep-2004  yamt make free page queue filo rather than fifo.
data in pages freed more recently are more likely on cpu cache.
 1.31 24-Mar-2004  junyoung Drop trailing spaces.
 1.30 03-Nov-2003  yamt add a DEBUG check if freed PG_ZERO pages are really zero-filled.
 1.29 01-Nov-2003  yamt in uvm_pagefree and friends, if freed pages have been marked by
PG_ZERO flag, put them to PGFL_ZEROS queue rather than default one
so that we can re-use zero-filled pages efficiently.
 1.28 26-Aug-2003  yamt use VM_PAGE_TO_PHYS macro instead of using phys_addr directly.
 1.27 02-Aug-2003  drochner sync comments with reality
 1.26 10-Mar-2003  thorpej branches: 1.26.2;
Make PGALLOC_VERBOSE compile where size_t != int.
 1.25 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.24 27-Jun-2002  drochner Big cleanup and speed improvements to pglist_alloc code:
-pass vm_physseg* instead of physseg index, and PFN (int) instead
of physical address (could be done even more)
-simplify detection of boundary crossing and behave more intelligently
in this case
-take stuff out of the inner loops, or put into "#ifdef DEBUG"
(because we move along physsegs we don't need to check that the
pages are physically contigous)
-make the "simple" and "contigous" branches look more uniform; at
least the outer loops might coalesce one day
 1.23 20-Jun-2002  enami Shift by PAGE_SHIFT instead of dividing by PAGE_SIZE.
 1.22 18-Jun-2002  drochner Make the DMA memory allocators (uvm_pglistalloc())
obey the preferences expressed by freelist assignment,
to avoid wasting valuable "low memory" to devices which
don't really need it.
comments:
-I'm not sure searching the physsegs within a freelist
beginning with the biggest is the right thing. This is
what the "memory steal" code in uvm_page.c does, so
keep it consistent.
-There seems to be some confusion whether the upper
address limit passed is inclusive or not. Stays on
the save side, possibly leaving one page out.
-The boundary/pagemask check can be simplified, also some
arguments passed are only used for diagnostic checks.
-Integration with UVM_PAGE_TRKOWN???
 1.21 02-Jun-2002  drochner move initialization of the "struct pglist" returned by uvm_pglistalloc()
from the calling code into uvm_pglistalloc() itself for consistency
and easier error handling
 1.20 29-May-2002  drochner Add another allocator to uvm_pglistalloc() which is used in the case where
no alignment / boundary / nsegs restrictions apply.
This one doesn't insist in a contigous range, and it honours the "waitok"
flag, thus succeeds in situations which were hopeless with the existing one.

(A solution which searches for a minimum number of contiguous ranges using
some best-fit or so algorithm would be expensive to implement; I believe the
"either-or" done here does reflect the current use by bus_dma quite well.)

Now agp memory allocation is robust for me. (tested on i810)
 1.19 10-Nov-2001  lukem branches: 1.19.4; 1.19.8;
add RCSIDs, and in some cases, slightly cleanup #include order
 1.18 15-Sep-2001  chs branches: 1.18.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.17 27-Jun-2001  thorpej branches: 1.17.2; 1.17.4;
Macro'ize the code that checks the free and inactive thresholds and
wakes the pagedaemon.
 1.16 26-May-2001  chs replace vm_page_t with struct vm_page *.
 1.15 25-May-2001  chs remove trailing whitespace.
 1.14 29-Apr-2001  thorpej Implement page coloring, using a round-robin bucket selection
algorithm (Solaris calls this "Bin Hopping").

This implementation currently relies on MD code to define a
constant defining the number of buckets. This will change
reasonably soon (MD code will be able to dynamically size
the bucket array).
 1.13 18-Feb-2001  chs branches: 1.13.2;
clean up DIAGNOSTIC checks, use KASSERT().
 1.12 25-Nov-2000  chs lots of cleanup:
use queue.h macros and KASSERT().
address amap offsets in pages instead of bytes.
make amap_ref() and amap_unref() take an amap, offset and length
instead of a vm_map_entry_t.
improve whitespace and comments.
 1.11 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.10 20-May-2000  thorpej Clean up a comment.
 1.9 24-Apr-2000  thorpej Changes necessary to implement pre-zero'ing of pages in the idle loop:
- Make page free lists have two actual queues: known-zero pages and
pages with unknown contents.
- Implement uvm_pageidlezero(). This function attempts to zero up to
the target number of pages until the target has been reached (currently
target is `all free pages') or until whichqs becomes non-zero (indicating
that a process is ready to run).
- Define a new hook for the pmap module for pre-zero'ing pages. This is
used to zero the pages using uncached access. This allows us to zero
as many pages as we want without polluting the cache.

In order to use this feature, each platform must add the appropropriate
glue in their idle loop.
 1.8 22-Jul-1999  thorpej branches: 1.8.2;
Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.7 24-May-1999  thorpej - Change uvm_{lock,unlock}_fpageq() to return/take the previous interrupt
level directly, instead of making the caller wrap the calls in
splimp()/splx().
- Add a comment documenting that interrupts that cause memory allocation
must be blocked while the free page queue is locked.

Since interrupts must be blocked while this lock is asserted, tying them
together like this helps to prevent mistakes.
 1.6 13-Aug-1998  eeh branches: 1.6.2; 1.6.8;
Merge paddr_t changes into the main branch.
 1.5 08-Jul-1998  thorpej branches: 1.5.2;
Add support for multiple memory free lists. There is at least one
default free list, and 0 - N additional free list, in order of descending
priority.

A new page allocation function, uvm_pagealloc_strat(), has been added,
providing three page allocation strategies:

- normal: high -> low priority free list walk, taking the
page off the first free list that has one.

- only: attempt to allocate a page only from the specified free
list, failing if that free list has none available.

- fallback: if `only' fails, fall back on `normal'.

uvm_pagealloc(...) is provided for normal use (and is a synonym for
uvm_pagealloc_strat(..., UVM_PGA_STRAT_NORMAL, 0); the free list argument
is ignored for the `normal' case).

uvm_page_physload() now specified which free list the pages will be
loaded onto. This means that some platforms which have multiple physical
memory segments may define additional vm_physsegs if they wish to break
individual physical segments into differing priorities.

Machine-dependent code must define _at least_ the following constants
in <machine/vmparam.h>:

VM_NFREELIST: the number of free lists the system will have

VM_FREELIST_DEFAULT: the default freelist (should always be 0,
but is defined in machdep code so that it's with all of the
other free list-related constants).

Additional free list names may be defined by machine-dependent code, but
they will only be used by machine-dependent code (e.g. for loading the
vm_physsegs).
 1.4 05-May-1998  kleink Remove inclusions of syscall (and syscall argument) related header files;
we don't need them here.
 1.3 09-Mar-1998  mrg KNF.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.5.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.6.8.2 21-Jun-1999  thorpej Sync w/ -current.
 1.6.8.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.6.2.1 25-Feb-1999  chs in uvm_pglistalloc(), treat pages being paged out as "free"
when deciding whether to wakeup the pagedaemon.
also, don't unlock the free page queue until we've done the wakeup.
 1.8.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.8.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.8.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.2.7 11-Nov-2002  nathanw Catch up to -current
 1.13.2.6 01-Aug-2002  nathanw Catch up to -current.
 1.13.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.13.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.13.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.13.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.13.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.17.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.17.2.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.17.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.17.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.18.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.19.8.3 15-Jul-2002  gehenna catch up with -current.
 1.19.8.2 20-Jun-2002  gehenna catch up with -current.
 1.19.8.1 30-May-2002  gehenna Catch up with -current.
 1.19.4.1 12-Mar-2002  thorpej Convert the fpageqlock to a spin mutex at IPL_VM and rename it
to fpageq_mutex.
 1.26.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.26.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.26.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.26.2.1 03-Aug-2004  skrll Sync with HEAD
 1.33.2.5 17-Mar-2008  yamt sync with head.
 1.33.2.4 03-Sep-2007  yamt sync with head.
 1.33.2.3 26-Feb-2007  yamt sync with head.
 1.33.2.2 30-Dec-2006  yamt sync with head.
 1.33.2.1 21-Jun-2006  yamt sync with head.
 1.34.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.34.8.3 15-Sep-2006  yamt make UVM_KICK_PDAEMON() a real function and stop including
uvm_pdpolicy.h from uvm.h. this also fixes build of pmap(1).
 1.34.8.2 24-May-2006  yamt sync with head.
 1.34.8.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.34.6.1 01-Jun-2006  kardel Sync with head.
 1.34.4.1 09-Sep-2006  rpaulo sync with head
 1.35.8.1 18-Nov-2006  ad Sync with head.
 1.36.6.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.37.12.1 15-Aug-2007  skrll Sync with HEAD.
 1.37.4.2 20-Aug-2007  ad Sync with HEAD.
 1.37.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.38.28.2 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.38.28.1 21-Jul-2007  ad file uvm_pglist.c was added on branch matt-mips64 on 2007-07-21 19:21:56 +0000
 1.38.26.3 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.38.26.2 02-Jun-2008  mjf Sync with HEAD.
 1.38.26.1 03-Apr-2008  mjf Sync with HEAD.
 1.38.22.1 24-Mar-2008  keiichi sync with head.
 1.38.6.1 23-Mar-2008  matt sync with HEAD
 1.39.4.3 11-Aug-2010  yamt sync with head.
 1.39.4.2 04-May-2009  yamt sync with head.
 1.39.4.1 16-May-2008  yamt sync with head.
 1.39.2.3 17-Jun-2008  yamt sync with head.
 1.39.2.2 04-Jun-2008  yamt sync with head
 1.39.2.1 18-May-2008  yamt sync with head.
 1.40.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.42.16.14 15-Feb-2014  matt Adapt to K{,D}ASSERTMSG changes
 1.42.16.13 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.42.16.12 14-Feb-2012  matt Add more KASSERTs (more! more! more!).
When returning page to the free pool, make sure to dequeue the pages before
hand or free page queue corruption will happen.
 1.42.16.11 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.42.16.10 03-Jun-2011  matt Restore $NetBSD$
 1.42.16.9 03-Jun-2011  matt Rework page free lists to be sorted by color first rather than free_list.
Kept per color PGFL_* counter in each page free list.
Minor cleanups.
 1.42.16.8 27-May-2011  matt Fix a bug where limit could be greater avail_end. Now if that happens, we
just bail. Use KDASSERTMSG so panics are more informative.
 1.42.16.7 25-May-2011  matt Make uvm_map recognize UVM_FLAG_COLORMATCH which tells uvm_map that the
'align' argument specifies the starting color of the KVA range to be returned.

When calling uvm_km_alloc with UVM_KMF_VAONLY, also specify the starting
color of the kva range returned (UMV_KMF_COLORMATCH) and pass those to
uvm_map.

In uvm_pglistalloc, make sure the pages being returned have sequentially
advancing colors (so they can be mapped in a contiguous address range).
Add a few missing UVM_FLAG_COLORMATCH flags to uvm_pagealloc calls.

Make the socket and pipe loan color-safe.

Make the mips pmap enforce strict page color (color(VA) == color(PA)).
 1.42.16.6 01-Jun-2010  matt Fix bad initialization spotted by Manuel Bouyer.
 1.42.16.5 23-Jan-2010  matt Use roundup2 instead of roundup when doing alignment rounding since all
alignments must be a power of 2. (thanks to rmind for suggesting it).
 1.42.16.4 23-Jan-2010  matt Add a start_hint to vm_physseg so when allocating pages, we can skip
forward over pages that are probably still allocated.
 1.42.16.3 22-Jan-2010  matt Remove some optimizations since they actually don't do the right thing.
We never want to test the starting page first since it doesn't really give
use any good information that we can use for the next pass.
 1.42.16.2 22-Jan-2010  snj Fix a couple comment typos.
 1.42.16.1 22-Jan-2010  matt Rework the algorithm to allocate contiguous pages to be much much faster.
(read the comments if you want to know how it's done).
 1.42.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.42.6.1 28-Apr-2009  skrll Sync with HEAD.
 1.45.4.3 31-May-2011  rmind sync with head
 1.45.4.2 05-Mar-2011  rmind sync with head
 1.45.4.1 03-Jul-2010  rmind sync with head
 1.45.2.5 21-Nov-2010  uebayasi Sync with HEAD.
 1.45.2.4 12-Nov-2010  uebayasi Fix debug code.
 1.45.2.3 17-Aug-2010  uebayasi Sync with HEAD.
 1.45.2.2 28-Apr-2010  uebayasi Always use struct vm_physseg *vm_physmem_ptrs[] in MD code.
 1.45.2.1 09-Feb-2010  uebayasi vm_nphysseg -> vm_nphysmem
 1.51.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.52.2.1 08-Feb-2011  bouyer Sync with HEAD
 1.62.16.1 18-May-2014  rmind sync with head
 1.62.12.2 03-Dec-2017  jdolecek update from HEAD
 1.62.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.62.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.62.2.1 06-Nov-2011  yamt remove pg->listq and uobj->memq
 1.63.2.1 10-Aug-2014  tls Rebase.
 1.67.4.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.67.2.1 05-Feb-2017  skrll Sync with HEAD
 1.70.16.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.70.16.1 10-Jun-2019  christos Sync with HEAD
 1.70.14.2 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.70.14.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.72.4.1 27-Feb-2020  martin Pull up following revision(s) (requested by rin in ticket #732):

sys/uvm/uvm_pglist.c: revision 1.80

Make this compile again with PGALLOC_VERBOSE.
 1.79.2.1 29-Feb-2020  ad Sync with head.
 1.86.4.1 03-Apr-2021  thorpej Sync with HEAD.
 1.86.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.90.4.1 15-Jan-2024  martin Pull up following revision(s) (requested by tnn in ticket #554):

sys/uvm/uvm_physseg.c: revision 1.20
sys/uvm/uvm_pglist.c: revision 1.91
sys/uvm/uvm_pglist.c: revision 1.92
sys/uvm/uvm_physseg.h: revision 1.9

uvm: change type of uvm_physseg.start_hint from u_int to u_long
Avoids assertion failure in uvm_pglistalloc_s_ps() with large paddrs.
PR kern/57683.

fix DEBUG build
 1.11 13-Apr-2020  ad Comments
 1.10 28-Dec-2019  martin branches: 1.10.6;
Include <sys/param.h> here directly to have a sane default for
COHERENCY_UNIT.
 1.9 27-Dec-2019  ad Redo the page allocator to perform better, especially on multi-core and
multi-socket systems. Proposed on tech-kern. While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
 1.8 06-Nov-2010  uebayasi branches: 1.8.60;
Provide a forward declaration of "struct vm_page", whose internal
is opaque to uvm_pglist.h users. Users don't need to pull in
uvm_page.h.
 1.7 04-Jun-2008  ad branches: 1.7.16; 1.7.18; 1.7.20;
- vm_page: put listq, pageq into a union alongside a LIST_ENTRY, so we can
use both types of list.

- Make page coloring and idle zero state per-CPU.

- Maintain per-CPU page freelists. When freeing, put pages onto the local
CPU's lists and the global lists. When allocating, prefer to take pages
from the local CPU. If none are available take from the global list as
done now. Proposed on tech-kern@.
 1.6 28-Apr-2008  martin branches: 1.6.2;
Remove clause 3 and 4 from TNF licenses
 1.5 25-Aug-2001  chs branches: 1.5.118; 1.5.120; 1.5.122;
use the correct symbol for multi-include protection.
 1.4 25-May-2001  chs branches: 1.4.2;
remove trailing whitespace.
 1.3 02-May-2001  thorpej Support dynamic sizing of the page color bins. We also support
dynamically re-coloring pages; as machine-dependent code discovers
the size of the system's caches, it may call uvm_page_recolor() with
the new number of colors to use. If the new mumber of colors is
smaller (or equal to) the current number of colors, then uvm_page_recolor()
is a no-op.

The system defaults to one bucket if machine-dependent code does not
initialize uvmexp.ncolors before uvm_page_init() is called.

Note that the number of color bins should be initialized to something
reasonable as early as possible -- for many early memory allocations,
we live with the consequences of the page choice for the lifetime of
the boot.
 1.2 29-Apr-2001  thorpej Implement page coloring, using a round-robin bucket selection
algorithm (Solaris calls this "Bin Hopping").

This implementation currently relies on MD code to define a
constant defining the number of buckets. This will change
reasonably soon (MD code will be able to dynamically size
the bucket array).
 1.1 26-Jun-2000  mrg branches: 1.1.2; 1.1.4;
remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.1.4.2 21-Sep-2001  nathanw Catch up to -current.
 1.1.4.1 21-Jun-2001  nathanw Catch up to -current.
 1.1.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.1.2.1 26-Jun-2000  bouyer file uvm_pglist.h was added on branch thorpej_scsipi on 2000-11-20 18:12:06 +0000
 1.4.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.5.122.2 04-May-2009  yamt sync with head.
 1.5.122.1 16-May-2008  yamt sync with head.
 1.5.120.2 17-Jun-2008  yamt sync with head.
 1.5.120.1 18-May-2008  yamt sync with head.
 1.5.118.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.5.118.1 02-Jun-2008  mjf Sync with HEAD.
 1.6.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.7.20.1 05-Mar-2011  rmind sync with head
 1.7.18.1 28-Apr-2010  uebayasi Don't expose uvm_page.h internal for usual uvm(9) users.
 1.7.16.6 19-Dec-2013  matt error out if VM_NFREELIST isn't defined
 1.7.16.5 16-Feb-2012  matt Track the victims selected by the pagedaemon and what happens to then.
Keep a hint for what page group has the most free pages for a given color.
 1.7.16.4 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.7.16.3 04-Nov-2011  matt #include <machine/vmparam.h> if VM_NFREELIST isn't defined.
 1.7.16.2 03-Jun-2011  matt Restore $NetBSD$
 1.7.16.1 03-Jun-2011  matt Rework page free lists to be sorted by color first rather than free_list.
Kept per color PGFL_* counter in each page free list.
Minor cleanups.
 1.8.60.2 21-Apr-2020  martin Sync with HEAD
 1.8.60.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.10.6.1 20-Apr-2020  bouyer Sync with HEAD
 1.20 13-Jan-2024  tnn uvm: change type of uvm_physseg.start_hint from u_int to u_long

Avoids assertion failure in uvm_pglistalloc_s_ps() with large paddrs.
PR kern/57683.
 1.19 23-Sep-2023  ad uvm_phys_to_vm_page() turns out to be a fairly central routine due to the
way that some of the pmaps work, so try to optimise it a little.
 1.18 09-Apr-2023  riastradh uvm(9): KASSERT(A && B) -> KASSERT(A); KASSERT(B)
 1.17 15-Jul-2020  rin branches: 1.17.20;
Fix typo. Use PRIxPADDR rather than casting.
 1.16 13-Jul-2020  mrg paddr_t can't be printed by "%lx" in some platforms.

fix the eg, i386 build.
 1.15 13-Jul-2020  mrg actually show the start/end that failed start < end in uvm_page_physload().
 1.14 15-Mar-2020  ad uvm_physseg: cluster fields used during RB tree lookup for PHYS_TO_VM_PAGE().
 1.13 21-Dec-2019  ad - Rename VM_PGCOLOR_BUCKET() to VM_PGCOLOR(). I want to reuse "bucket" for
something else soon and TBH it matches what this macro does better.

- Add inlines to set/get locator values in the unused lower bits of
pg->phys_addr. Begin by using it to cache the freelist index, because
computing it is expensive and that shows up during profiling. Discussed
on tech-kern.
 1.12 20-Dec-2019  ad KNF
 1.11 13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.10 20-Sep-2019  maxv Fix programming mistake: 'paddrp' is a pointer given as argument, setting
it to NULL in the called function does not set it to NULL in the caller.

Actually, the callers of these functions do not do anything with the
special error handling, so drop the unused checks and the NULL assignments
altogether.

Found by the lgtm bot.
 1.9 21-Jan-2018  christos branches: 1.9.4;
CID-1427737: Pacify coverity using KASSERT
 1.8 19-Mar-2017  riastradh branches: 1.8.12;
__diagused police
 1.7 02-Feb-2017  uwe branches: 1.7.2;
Add missing spaces in split string literals.
 1.6 29-Dec-2016  rin branches: 1.6.2; 1.6.4;
Protect uvm_physseg_set_avail_{start,end} by UVM_PHYSSEG_LEGACY.
All the ports other than acorn26 do not use them any longer.
Ok cherry
 1.5 25-Dec-2016  cherry Make uvm_physseg_set_avail_start(9) available unconditional to UVM_HOTPLUG
 1.4 25-Dec-2016  christos Provide a set_available_start method for the non UVM_HOTPLUG case.
 1.3 23-Dec-2016  cherry Omitted assigning handle return value for the case:
(VM_PHYSSEG_STRAT == VM_PSTRAT_RANDOM)

Fix this.
 1.2 22-Dec-2016  cherry convention about function names for predicate checking:
s/uvm_physseg_valid()/uvm_physseg_valid_p()/

per. matt@
 1.1 19-Dec-2016  cherry This is a preview of the uvm_hotplug(9) api code.
This commit does not actually introduce the UVM_HOTPLUG option.
However it does provide developers a way to review, test and try out
the API.

To do this, please go to tests/sys/uvm/ and build and run the tests
there. The tests also have a set of basic load tests, to get a measure
of the performance penalties due to enabling the UVM_HOTPLUG option.

In order to build the tests you need to have at least done the
following in $SRC/

cd $SRC; $NBMAKE do-distrib-dirs includes
cd $SRC/lib/csu; $NBMAKE all install || exit
cd $SRC/external/gpl3/gcc/lib/libgcc/libgcc_s; $NBMAKE all install || exit
cd $SRC/external/gpl3/gcc/lib/libgcc/libgcc; $NBMAKE all install || exit
cd $SRC/lib/libc; $NBMAKE includes all install || exit
cd $SRC/lib/libpthread; $NBMAKE all install || exit
cd $SRC/lib/libm; $NBMAKE all install || exit
cd $SRC/external/gpl3/gcc/lib/libstdc++-v3/; $NBMAKE all install || exit

Once the development environment has these userspace libraries, one
can simple build using $NBMAKE and finally test the kernel API using

atf-run|atf-report
 1.6.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.6.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.6.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.6.2.1 29-Dec-2016  pgoyette file uvm_physseg.c was added on branch pgoyette-localcount on 2017-01-07 08:56:53 +0000
 1.7.2.3 28-Aug-2017  skrll Sync with HEAD
 1.7.2.2 05-Feb-2017  skrll Sync with HEAD
 1.7.2.1 02-Feb-2017  skrll file uvm_physseg.c was added on branch nick-nhusb on 2017-02-05 13:41:01 +0000
 1.8.12.2 03-Dec-2017  jdolecek update from HEAD
 1.8.12.1 19-Mar-2017  jdolecek file uvm_physseg.c was added on branch tls-maxphys on 2017-12-03 11:39:22 +0000
 1.9.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.9.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.17.20.1 15-Jan-2024  martin Pull up following revision(s) (requested by tnn in ticket #554):

sys/uvm/uvm_physseg.c: revision 1.20
sys/uvm/uvm_pglist.c: revision 1.91
sys/uvm/uvm_pglist.c: revision 1.92
sys/uvm/uvm_physseg.h: revision 1.9

uvm: change type of uvm_physseg.start_hint from u_int to u_long
Avoids assertion failure in uvm_pglistalloc_s_ps() with large paddrs.
PR kern/57683.

fix DEBUG build
 1.9 13-Jan-2024  tnn uvm: change type of uvm_physseg.start_hint from u_int to u_long

Avoids assertion failure in uvm_pglistalloc_s_ps() with large paddrs.
PR kern/57683.
 1.8 02-Jan-2017  cherry branches: 1.8.2; 1.8.6; 1.8.18; 1.8.52;
Move sys/uvm/uvm_physseg.h inclusion to within _KERNEL only.
 1.7 29-Dec-2016  rin Protect uvm_physseg_set_avail_{start,end} by UVM_PHYSSEG_LEGACY.
All the ports other than acorn26 do not use them any longer.
Ok cherry
 1.6 26-Dec-2016  cherry Add copyright info.

After sufficient poking by Taylor.
 1.5 24-Dec-2016  maya as a stopgap fix make all of uvm_physseg.h kernel-only. this file uses
paddr_t which isn't available to userland, breaking builds that use uvm
headers, like devel/libuv on pkgsrc.

pointed out by Carsten Kunze on pkgsrc-users.

ok riastradh
 1.4 23-Dec-2016  cherry "Make NetBSD great again!"

Introduce uvm_hotplug(9) to the kernel.

Many thanks, in no particular order to:

TNF, for funding the project.

Chuck Silvers - for multiple API reviews and feedback.
Nick Hudson - for testing on multiple architectures and bugfix patches.
Everyone who helped with boot testing.

KeK (http://www.kek.org.in) for hosting the primary developers.
 1.3 22-Dec-2016  cherry Turn off uvm_hotplug option selection until we actually have it.

Should fix the build.
 1.2 22-Dec-2016  cherry convention about function names for predicate checking:
s/uvm_physseg_valid()/uvm_physseg_valid_p()/

per. matt@
 1.1 19-Dec-2016  cherry This is a preview of the uvm_hotplug(9) api code.
This commit does not actually introduce the UVM_HOTPLUG option.
However it does provide developers a way to review, test and try out
the API.

To do this, please go to tests/sys/uvm/ and build and run the tests
there. The tests also have a set of basic load tests, to get a measure
of the performance penalties due to enabling the UVM_HOTPLUG option.

In order to build the tests you need to have at least done the
following in $SRC/

cd $SRC; $NBMAKE do-distrib-dirs includes
cd $SRC/lib/csu; $NBMAKE all install || exit
cd $SRC/external/gpl3/gcc/lib/libgcc/libgcc_s; $NBMAKE all install || exit
cd $SRC/external/gpl3/gcc/lib/libgcc/libgcc; $NBMAKE all install || exit
cd $SRC/lib/libc; $NBMAKE includes all install || exit
cd $SRC/lib/libpthread; $NBMAKE all install || exit
cd $SRC/lib/libm; $NBMAKE all install || exit
cd $SRC/external/gpl3/gcc/lib/libstdc++-v3/; $NBMAKE all install || exit

Once the development environment has these userspace libraries, one
can simple build using $NBMAKE and finally test the kernel API using

atf-run|atf-report
 1.8.52.1 15-Jan-2024  martin Pull up following revision(s) (requested by tnn in ticket #554):

sys/uvm/uvm_physseg.c: revision 1.20
sys/uvm/uvm_pglist.c: revision 1.91
sys/uvm/uvm_pglist.c: revision 1.92
sys/uvm/uvm_physseg.h: revision 1.9

uvm: change type of uvm_physseg.start_hint from u_int to u_long
Avoids assertion failure in uvm_pglistalloc_s_ps() with large paddrs.
PR kern/57683.

fix DEBUG build
 1.8.18.2 03-Dec-2017  jdolecek update from HEAD
 1.8.18.1 02-Jan-2017  jdolecek file uvm_physseg.h was added on branch tls-maxphys on 2017-12-03 11:39:22 +0000
 1.8.6.2 05-Feb-2017  skrll Sync with HEAD
 1.8.6.1 02-Jan-2017  skrll file uvm_physseg.h was added on branch nick-nhusb on 2017-02-05 13:41:01 +0000
 1.8.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.8.2.1 02-Jan-2017  pgoyette file uvm_physseg.h was added on branch pgoyette-localcount on 2017-01-07 08:56:53 +0000
 1.43 20-Aug-2022  riastradh uvm/uvm_pmap.h: Fix missing types and forward declarations.

- Need sys/types.h for vaddr_t, paddr_t, u_int, &c.
- Forward-declare struct vm_page so we don't have to rely on
machine/pmap.h to do so.
 1.42 16-Feb-2022  riastradh uvm: MI declaration of pmap_pv_protect.
 1.41 07-Mar-2021  skrll <tab> consistency
 1.40 14-Mar-2020  ad branches: 1.40.4;
pmap_remove_all(): Return a boolean value to indicate the behaviour. If
true, all mappings have been removed, the pmap is totally cleared out, and
UVM can then avoid doing the work to call pmap_remove() for each map entry.
If false, either nothing has been done, or some helpful arch-specific voodoo
has taken place.
 1.39 19-May-2018  jdolecek branches: 1.39.2;
add experimental new function uvm_direct_process(), to allow of read/writes
of contents of uvm pages without mapping them into kernel, using
direct map or moral equivalent; pmaps supporting the interface need
to provide pmap_direct_process() and define PMAP_DIRECT

implement the new interface for amd64; I hear alpha and mips might be relatively
easy to add too, but I lack the knowledge

part of resolution for PR kern/53124
 1.38 02-Feb-2013  matt branches: 1.38.36;
Remove __BEGIN_DECLS/__END_DECLS
Allow pmap_kenter_pa to be a macro.
 1.37 30-Jun-2011  matt branches: 1.37.2; 1.37.12;
Move PMAP_* cache defines to before inclusion of <machine/pmap.h>
 1.36 11-Feb-2011  jmcneill add optional MD pmap_mmap_flags macro for passing flags between cdev_mmap
and pmap_enter, ok matt@
 1.35 29-Nov-2010  mrg branches: 1.35.2; 1.35.4;
put the kernel-only externs back before <machine/pmap.h>. fixes ofppc build.
 1.34 26-Nov-2010  christos don't leak kernel variables to userland!
 1.33 06-Jul-2010  cegger Turn PMAP_NOCACHE into MI flag.
Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR.
Update pmap(9) manpage.

hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag
mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.

x86: Implement new MI flags using Page-Attribute Tables.
x86: Implement BUS_SPACE_MAP_PREFETCHABLE.

Patch presented on tech-kern@:
http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html

No comments on this last version.
Forgot to commit this in previous.
 1.32 07-Nov-2009  cegger branches: 1.32.2; 1.32.4;
Add a flags argument to pmap_kenter_pa(9).
Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html
No objections.
 1.31 21-Oct-2009  rmind Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.30 19-Aug-2009  thorpej Use PMAP_ENABLE_PMAP_KMPAGE to enable PMAP_KMPAGE. We still want the bit
defined in the MI space, not in an MD header.
 1.29 19-Aug-2009  thorpej Rationalize the definition of PMAP_KMPAGE.
 1.28 23-Apr-2009  cegger use top-most bits for PMAP_MD_MASK instead something in the middle.
per request from christos@
 1.27 21-Apr-2009  cegger change pmap flags argument from int to u_int.
discussed with christos@ on source-changes-d@
 1.26 18-Apr-2009  cegger Introduce PMAP_MD_MASK. Reserves PMAP bits for use in MD code.
Presented on tech-kern@, port-i386@ and port-amd64@
ok ad@
 1.25 10-Dec-2008  pooka branches: 1.25.2;
Make kernel_pmap_ptr a const. Requested by steve_martin.
 1.24 09-Dec-2008  pooka Make pmap_kernel() a MI macro for struct pmap *kernel_pmap_ptr,
which is now the "API" provided by the pmap module. pmap_kernel()
remains as the syntactic sugar.

Bonus cosmetics round: move all the pmap_t pointer typedefs into
uvm_pmap.h.

Thanks to Greg Oster for providing cpu muscle for doing test builds.
 1.23 16-Jul-2008  matt branches: 1.23.2; 1.23.10;
Default PMAP_KMPAGE to 0 unless it's been previously defined by
<machine/pmap.h>
 1.22 16-Jul-2008  matt Add PMAP_KMPAGE flag for pmap_kenter_pa. This allows pmaps to know that
the page being entered is being for the kernel memory allocator. Such pages
should have no references and don't need bookkeeping.
 1.21 16-Jul-2007  macallan branches: 1.21.28; 1.21.32; 1.21.34; 1.21.36; 1.21.38;
change pmap_phys_address()s parameter to paddr_t since that's what it gets
fed from mmap*() anyway
approved by gimpy
 1.20 21-Feb-2007  thorpej branches: 1.20.4;
Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.19 11-Dec-2005  christos branches: 1.19.26;
merge ktrace-lwp.
 1.18 27-Mar-2004  he branches: 1.18.16;
Conditionalize a few more declarations, as they may be defined as macros:
pmap_collect, pmap_reference, and pmap_remove (observed lossage for vax).
 1.17 24-Mar-2004  junyoung Nuke __P().
 1.16 23-Mar-2004  junyoung pmap_copy() and pmap_update() might be defined as macros in <machine/pmap.h>.
 1.15 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.14 10-May-2003  thorpej branches: 1.14.2;
Back out the following chagne:
http://mail-index.netbsd.org/source-changes/2003/05/08/0068.html

There were some side-effects that I didn't anticipate, and fixing them
is proving to be more difficult than I thought, do just eject for now.
Maybe one day we can look at this again.

Fixes PR kern/21517.
 1.13 08-May-2003  thorpej Simplify the way the bounds of the managed kernel virtual address
space is advertised to UVM by making virtual_avail and virtual_end
first-class exported variables by UVM. Machine-dependent code is
responsible for initializing them before main() is called. Anything
that steals KVA must adjust these variables accordingly.

This reduces the number of instances of this info from 3 to 1, and
simplifies the pmap(9) interface by removing the pmap_virtual_space()
function call, and removing two arguments from pmap_steal_memory().

This also eliminates some kludges such as having to burn kernel_map
entries on space used by the kernel and stolen KVA.

This also eliminates use of VM_{MIN,MAX}_KERNEL_ADDRESS from MI code,
this giving MD code greater flexibility over the bounds of the managed
kernel virtual address space if a given port's specific platforms can
vary in this regard (this is especially true of the evb* ports).
 1.12 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.11 22-Sep-2002  chs add pmap_remove_all() hook (empty on most platforms so far).
 1.10 10-Apr-2002  thorpej Allow pmap_copy_page() and pmap_zero_page() to be #define'd
in <machine/pmap.h>.
 1.9 10-Sep-2001  chris Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.
 1.8 05-Aug-2001  matt branches: 1.8.2;
Don't include <machine/pmap.h> and <machine/vmparam.h> if _KERNEL isn't
defined. Include them explicitly in the few kvm_arch.c that need them.
 1.7 25-May-2001  chs branches: 1.7.2;
remove trailing whitespace.
 1.6 24-Apr-2001  thorpej Some spring cleaning.
 1.5 22-Apr-2001  thorpej Remove pmap_kenter_pgs(). It was never really adopted by
anything, and the interface itself wasn't as flexible as
callers would have probably liked.
 1.4 22-Apr-2001  thorpej Undo a misguided previous change to the pmap_update() API.
 1.3 22-Apr-2001  thorpej Make pmap_virtual_space() a required pmap function, even on platforms
which have pmap_steal_memory(). This is to reduce the API differences
between pmaps that implement pmap_steal_memory() and pmaps which do
not.

Note that pmap_steal_memory() needs to adjust *vstartp and/or
*vendp only if it used addresses within the range provided to UVM
via the pmap_virtual_space() call. I.e. it is not necessary to do
so in any current pmap_steal_memory() implementation.
 1.2 22-Apr-2001  thorpej Give pmap_update() an argument (a pmap_t) so that it knows which
pmap it should be updating.
 1.1 27-Jun-2000  mrg branches: 1.1.2; 1.1.4;
more vm header file changes:

<vm/vm_extern.h> merged into <uvm/uvm_extern.h>
<vm/vm_page.h> merged into <uvm/uvm_page.h>
<vm/pmap.h> has become <uvm/uvm_pmap.h>

this leaves just <vm/vm.h> in NetBSD.
 1.1.4.6 18-Oct-2002  nathanw Catch up to -current.
 1.1.4.5 17-Apr-2002  nathanw Catch up to -current.
 1.1.4.4 21-Sep-2001  nathanw Catch up to -current.
 1.1.4.3 24-Aug-2001  nathanw Catch up with -current.
 1.1.4.2 21-Jun-2001  nathanw Catch up to -current.
 1.1.4.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.1.2.3 23-Apr-2001  bouyer Sync with HEAD.
 1.1.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.1.2.1 27-Jun-2000  bouyer file uvm_pmap.h was added on branch thorpej_scsipi on 2000-11-20 18:12:06 +0000
 1.7.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.7.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.7.2.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.7.2.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.8.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.14.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.14.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.14.2.1 03-Aug-2004  skrll Sync with HEAD
 1.18.16.2 03-Sep-2007  yamt sync with head.
 1.18.16.1 26-Feb-2007  yamt sync with head.
 1.19.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.20.4.1 20-Aug-2007  ad Sync with HEAD.
 1.21.38.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.21.38.1 19-Oct-2008  haad Sync with HEAD.
 1.21.36.1 18-Jul-2008  simonb Sync with head.
 1.21.34.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.21.32.5 11-Aug-2010  yamt sync with head.
 1.21.32.4 11-Mar-2010  yamt sync with head
 1.21.32.3 16-Sep-2009  yamt sync with head
 1.21.32.2 19-Aug-2009  yamt sync with head.
 1.21.32.1 04-May-2009  yamt sync with head.
 1.21.28.2 17-Jan-2009  mjf Sync with HEAD.
 1.21.28.1 28-Sep-2008  mjf Sync with HEAD.
 1.23.10.1 15-Feb-2014  matt Add PMAP_NOCACHE + others.
 1.23.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.23.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.25.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.32.4.1 05-Mar-2011  rmind sync with head
 1.32.2.7 15-Nov-2010  uebayasi Revert xmd(4).
 1.32.2.6 31-Oct-2010  uebayasi We already have a flag PMAP_NOCACHE. s/PMAP_UNMANAGED/PMAN_NOCACHE/.
Pointed out by Chuck Silvers, thanks.
 1.32.2.5 30-Oct-2010  uebayasi Implement pmap_physload_device(9) to replace xmd(4) MD backend.
Implement pmap_mmap(9) and use it from mem(4) and xmd(4).
 1.32.2.4 17-Aug-2010  uebayasi Sync with HEAD.
 1.32.2.3 11-Aug-2010  uebayasi If both __HAVE_PMAP_PHYSSEG and __HAVE_PMAP_PHYSSEG_INIT is defined,
call per-vm_physseg initialization/finalization hooks.
 1.32.2.2 27-Apr-2010  uebayasi On second thought, rename PMAP_UNCACHEABLE as PMAP_UNMANAGED.
 1.32.2.1 27-Apr-2010  uebayasi Introduce PMAP_UNCACHEABLE, a flag to tell pmap_enter(9) to enter a H/W
mapping as cache disabled, even for managed memory and device pages.

(In the long run, we should pass more explicit control from UVM rather
than the current way that pmap(9) checks if a given paddr_t is managed
(== contained in one of struct vm_physseg [] arrays).)
 1.35.4.1 17-Feb-2011  bouyer Sync with HEAD
 1.35.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.37.12.1 25-Feb-2013  tls resync with head
 1.37.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.38.36.1 21-May-2018  pgoyette Sync with HEAD
 1.39.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.40.4.1 03-Apr-2021  thorpej Sync with HEAD.
 1.4 11-Dec-2005  christos merge ktrace-lwp.
 1.3 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.2 25-May-2001  chs branches: 1.2.22;
remove trailing whitespace.
 1.1 25-Jun-2000  mrg branches: 1.1.2; 1.1.4;
<vm/vm_prot.h> becomes <uvm/uvm_prot.h>
 1.1.4.1 21-Jun-2001  nathanw Catch up to -current.
 1.1.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.1.2.1 25-Jun-2000  bouyer file uvm_prot.h was added on branch thorpej_scsipi on 2000-11-20 18:12:06 +0000
 1.2.22.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.22.2 18-Sep-2004  skrll Sync with HEAD.
 1.2.22.1 03-Aug-2004  skrll Sync with HEAD
 1.16 23-Sep-2023  ad Repply this change with a couple of bugs fixed:

- Do away with separate pool_cache for some kernel objects that have no special
requirements and use the general purpose allocator instead. On one of my
test systems this makes for a small (~1%) but repeatable reduction in system
time during builds presumably because it decreases the kernel's cache /
memory bandwidth footprint a little.
- vfs_lockf: cache a pointer to the uidinfo and put mutex in the data segment.
 1.15 12-Sep-2023  ad Back out recent change to replace pool_cache with then general allocator.
Will return to this when I have time again.
 1.14 10-Sep-2023  ad - Do away with separate pool_cache for some kernel objects that have no special
requirements and use the general purpose allocator instead. On one of my
test systems this makes for a small (~1%) but repeatable reduction in system
time during builds presumably because it decreases the kernel's cache /
memory bandwidth footprint a little.
- vfs_lockf: cache a pointer to the uidinfo and put mutex in the data segment.
 1.13 19-May-2020  ad Drop & re-acquire vmobjlock less often.
 1.12 08-Mar-2020  ad Only need a read lock for uvm_pagelookup().
 1.11 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.10 19-May-2018  jdolecek branches: 1.10.2; 1.10.8;
adjust heuristics for read-ahead to skip the full read-ahead when last page of
the range is already cached; this speeds up I/O from cache, since it avoids
the lookup and allocation overhead

on my system I observed 4.5% - 15% improvement for cached I/O - from 2.2 GB/s to
2.3 GB/s for cached reads using non-direct UBC, and from 5.6 GB/s to 6.5 GB/s
for UBC using direct map

part of PR kern/53124
 1.9 30-Mar-2018  mlelstv Increase UVM read ahead window limit a bit to match concurrency of reading
from the raw device.
 1.8 12-Jun-2011  rmind branches: 1.8.12; 1.8.52;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.7 15-Oct-2010  tsutsui branches: 1.7.6;
Make common kernel module binaries work on both sun3 and sun3x.
Tested on 3/160 (on TME) and (real) 3/80.

XXX: module files can be loaded only on single user?
 1.6 10-Jun-2009  yamt branches: 1.6.2; 1.6.4;
- add a function to perform explicit read-ahead.
- ra_startio: tweak locking a bit.
 1.5 02-Jan-2008  ad branches: 1.5.10; 1.5.24;
Merge vmlocking2 to head.
 1.4 11-May-2007  tsutsui branches: 1.4.8; 1.4.14; 1.4.16; 1.4.20;
Add temporary workaround for PR kern/36019 (panic on sun2 and sun3).
Ok'ed by yamt.
 1.3 12-Mar-2007  ad branches: 1.3.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.2 29-Nov-2005  yamt branches: 1.2.2; 1.2.20; 1.2.28; 1.2.30; 1.2.34;
add files i forgot to add when merging yamt-readahead branch.
 1.1 15-Nov-2005  yamt branches: 1.1.2;
file uvm_readahead.c was initially added on branch yamt-readahead.
 1.1.2.16 22-Nov-2005  yamt comments.
 1.1.2.15 22-Nov-2005  yamt make ractx_pool static.
 1.1.2.14 22-Nov-2005  yamt comments.
 1.1.2.13 20-Nov-2005  yamt uvm_ra_request: fix an off-by-one error.
 1.1.2.12 20-Nov-2005  yamt uvm_ra_request: don't shrink window when reading the same chunk repeatedly.
 1.1.2.11 19-Nov-2005  yamt - as read-ahead context is per-vnode now,
there are less reasons to make VOP_READ call uvm_ra_request explicitly.
move it to pager (uvn_get) so that it can handle accesses via mmap as well.
- pass advice to pager via ubc.
- tweak DPRINTF.

XXX can be disturbed by PGO_LOCKED.

XXX it's controversial where it should be done.
(uvm_fault, uvn_get or genfs_getpages.)
 1.1.2.10 19-Nov-2005  yamt ra_startio: don't bother to read busy chunk again and again.
 1.1.2.9 18-Nov-2005  yamt - associate read-ahead context to vnode, rather than file.
- revert VOP_READ prototype.
 1.1.2.8 17-Nov-2005  yamt correct a comment.
 1.1.2.7 17-Nov-2005  yamt use UVM_ADV_ rather than POSIX_FADV_.
 1.1.2.6 17-Nov-2005  yamt use DPRINTF rather than explicit #ifdef. suggested by Chuck Silvers.
 1.1.2.5 17-Nov-2005  yamt comments.
 1.1.2.4 15-Nov-2005  yamt fix a reversed condition in the previous.
 1.1.2.3 15-Nov-2005  yamt - #ifdef out debug printf.
- an assertion.
 1.1.2.2 15-Nov-2005  yamt add posix_fadvise.
 1.1.2.1 15-Nov-2005  yamt add simple readahead routines.
 1.2.34.3 08-Jun-2007  ad Sync with head.
 1.2.34.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.2.34.1 13-Mar-2007  ad Sync with head.
 1.2.30.2 17-May-2007  yamt sync with head.
 1.2.30.1 24-Mar-2007  yamt sync with head.
 1.2.28.1 13-May-2007  pavel Pull up following revision(s) (requested by tsutsui in ticket #641):
sys/uvm/uvm_readahead.c: revision 1.4
Add temporary workaround for PR kern/36019 (panic on sun2 and sun3).
Ok'ed by yamt.
 1.2.20.4 21-Jan-2008  yamt sync with head
 1.2.20.3 03-Sep-2007  yamt sync with head.
 1.2.20.2 21-Jun-2006  yamt sync with head.
 1.2.20.1 29-Nov-2005  yamt file uvm_readahead.c was added on branch yamt-lazymbuf on 2006-06-21 15:12:40 +0000
 1.2.2.2 11-Dec-2005  christos Sync with head.
 1.2.2.1 29-Nov-2005  christos file uvm_readahead.c was added on branch ktrace-lwp on 2005-12-11 10:29:42 +0000
 1.3.2.1 11-Jul-2007  mjf Sync with head.
 1.4.20.1 02-Jan-2008  bouyer Sync with HEAD
 1.4.16.2 18-Dec-2007  ad Lock readahead context using the associated object's lock.
 1.4.16.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.4.14.1 18-Feb-2008  mjf Sync with HEAD.
 1.4.8.1 09-Jan-2008  matt sync with HEAD
 1.5.24.1 23-Jul-2009  jym Sync with HEAD.
 1.5.10.1 20-Jun-2009  yamt sync with head
 1.6.4.2 05-Mar-2011  rmind sync with head
 1.6.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.6.2.1 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.7.6.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.8.52.2 21-May-2018  pgoyette Sync with HEAD
 1.8.52.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.8.12.3 09-Oct-2012  bouyer Redo previous: it seems that the point of the bytelen computation was to
get transfers aligned to chunksz. So reintroduce the code, but using chunksz
instead of chunksize (if the readahead is trucated there's no point in
trying to align it anyway).
Now I get 64k read requests at the drive level again.
 1.8.12.2 09-Oct-2012  bouyer Fix panic "bad chunksize ..." in read-ahead code:
- off comes from the pager, so should already be page-aligned.
KASSERT() that it is, and remove the off = trunc_page(off)
- as off is not changed any more, the size of the transfer is chunksize.
Don't compute bytelen any more, which is what required chunksize
to be a power of 2. KASSERT() that chunksize is a multiple of page size.
 1.8.12.1 12-Sep-2012  tls Initial snapshot of work to eliminate 64K MAXPHYS. Basically works for
physio (I/O to raw devices); needs more doing to get it going with the
filesystems, but it shouldn't damage data.

All work's been done on amd64 so far. Not hard to add support to other
ports. If others want to pitch in, one very helpful thing would be to
sort out when and how IDE disks can do 128K or larger transfers, and
adjust the various PCI IDE (or at least ahcisata) drivers and wd.c
accordingly -- it would make testing much easier. Another very helpful
thing would be to implement a smart minphys() for RAIDframe along the
lines detailed in the MAXPHYS-NOTES file.
 1.10.8.1 29-Feb-2020  ad Sync with head.
 1.10.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.4 10-Jun-2009  yamt branches: 1.4.22;
- add a function to perform explicit read-ahead.
- ra_startio: tweak locking a bit.
 1.3 02-Jan-2008  ad branches: 1.3.10; 1.3.24;
Merge vmlocking2 to head.
 1.2 29-Nov-2005  yamt branches: 1.2.2; 1.2.20; 1.2.50; 1.2.56; 1.2.60; 1.2.64;
add files i forgot to add when merging yamt-readahead branch.
 1.1 15-Nov-2005  yamt branches: 1.1.2;
file uvm_readahead.h was initially added on branch yamt-readahead.
 1.1.2.3 18-Nov-2005  yamt - associate read-ahead context to vnode, rather than file.
- revert VOP_READ prototype.
 1.1.2.2 15-Nov-2005  yamt unwrap a short line.
 1.1.2.1 15-Nov-2005  yamt add simple readahead routines.
 1.2.64.1 02-Jan-2008  bouyer Sync with HEAD
 1.2.60.1 18-Dec-2007  ad Lock readahead context using the associated object's lock.
 1.2.56.1 18-Feb-2008  mjf Sync with HEAD.
 1.2.50.1 09-Jan-2008  matt sync with HEAD
 1.2.20.3 21-Jan-2008  yamt sync with head
 1.2.20.2 21-Jun-2006  yamt sync with head.
 1.2.20.1 29-Nov-2005  yamt file uvm_readahead.h was added on branch yamt-lazymbuf on 2006-06-21 15:12:40 +0000
 1.2.2.2 11-Dec-2005  christos Sync with head.
 1.2.2.1 29-Nov-2005  christos file uvm_readahead.h was added on branch ktrace-lwp on 2005-12-11 10:29:42 +0000
 1.3.24.1 23-Jul-2009  jym Sync with HEAD.
 1.3.10.1 20-Jun-2009  yamt sync with head
 1.4.22.1 12-Sep-2012  tls Initial snapshot of work to eliminate 64K MAXPHYS. Basically works for
physio (I/O to raw devices); needs more doing to get it going with the
filesystems, but it shouldn't damage data.

All work's been done on amd64 so far. Not hard to add support to other
ports. If others want to pitch in, one very helpful thing would be to
sort out when and how IDE disks can do 128K or larger transfers, and
adjust the various PCI IDE (or at least ahcisata) drivers and wd.c
accordingly -- it would make testing much easier. Another very helpful
thing would be to implement a smart minphys() for RAIDframe along the
lines detailed in the MAXPHYS-NOTES file.
 1.1 17-Jul-2023  riastradh uvm(9): One rndsource for faults -- not one per CPU.

All relevant state is per-CPU anyway; the only substantive difference
this makes is how many entries appear in `rndctl -l' output and what
they are called -- formerly the somewhat confusing `cpuN', meaning
`page faults on cpuN', and now just `uvmfault'. I don't think
there's any real value in being able to enable or disable measurement
or counting of page faults on one CPU vs others, so although this
could be a minor compatibility change, it's hard to imagine it
matters much.

XXX kernel ABI change in struct cpu_info
 1.46 14-Jun-2020  ad Remove PG_ZERO. It worked brilliantly on x86 machines from the mid-90s but
having spent an age experimenting with it over the last 6 months on various
machines and with different use cases it's always either break-even or a
slight net loss for me.
 1.45 11-Jun-2020  ad Counter tweaks:

- Don't need to count anonpages+filepages any more; clean+unknown+dirty for
each kind of page can be summed to get the totals.

- Track the number of free pages with a counter so that it's one less thing
for the allocator to do, which opens up further options there.

- Remove cpu_count_sync_one(). It has no users and doesn't save a whole lot.
For the cheap option, give cpu_count_sync() a boolean parameter indicating
that a cached value is okay, and rate limit the updates for cached values
to hz.
 1.44 11-Jun-2020  ad uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
 1.43 31-Dec-2019  ad Rename uvm_free() -> uvm_availmem().
 1.42 21-Dec-2019  ad uvmexp.free -> uvm_free()
 1.41 16-Dec-2019  ad - Extend the per-CPU counters matt@ did to include all of the hot counters
in UVM, excluding uvmexp.free, which needs special treatment and will be
done with a separate commit. Cuts system time for a build by 20-25% on
a 48 CPU machine w/DIAGNOSTIC.

- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
 1.40 09-May-2019  skrll Avoid KASSERT(!cpu_intr_p()) when breaking into ddb and issuing

show uvmexp
 1.39 02-Dec-2017  mrg branches: 1.39.4;
add two new members to uvmexp_sysctl{}: bootpages and poolpages.
bootpages is set to the pages allocated via uvm_pageboot_alloc().
poolpages is calculated from the list of pools nr_pages members.

this brings us closer to having a valid total of pages known by
the system, vs actual pages originally managed.

XXX: poolpages needs some handling for PR_RECURSIVE pools still.
 1.38 01-Dec-2016  mrg fix the output of ddb's "show uvmexp" and also print the
reserve_pagedaemon, reserve_kernel, and zeropages values.
 1.37 17-May-2011  mrg branches: 1.37.14; 1.37.32; 1.37.36;
move and rename the uvm history code out of uvm_stat to "kernhist".

rename "UVMHIST" option to enable the uvm histories.

TODO:
- make UVMHIST properly depend upon KERNHIST
- enable dynamic registration of histories. this is mostly just
allocating something in a bitmap, and is only for viewing multiple
histories in a merged form.


tested on amd64 and sparc64.
 1.36 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.35 05-Jan-2011  enami branches: 1.35.2; 1.35.4;
Fix format string; use PRIu64 for uint64_t.
 1.34 04-Jan-2011  matt Print the number of page colors in use with db> show uvm
 1.33 20-Dec-2010  matt Move counting of faults, traps, intrs, soft[intr]s, syscalls, and nswtch
from uvmexp to per-cpu cpu_data and move them to 64bits. Remove unneeded
includes of <uvm/uvm_extern.h> and/or <uvm/uvm.h>.
 1.32 21-Oct-2009  rmind branches: 1.32.4;
Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.31 08-Aug-2008  skrll branches: 1.31.12;
Make "show uvmhist" available to all arches (not just sparc*) in ddb.
 1.30 15-Sep-2006  yamt branches: 1.30.50; 1.30.54; 1.30.56; 1.30.58; 1.30.60;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.29 29-Nov-2005  yamt branches: 1.29.8; 1.29.20;
read-ahead statistics.
 1.28 27-Jun-2005  thorpej branches: 1.28.2; 1.28.8;
Use ANSI function decls.
 1.27 15-May-2005  yamt remove anon related statistics which are no longer used.
 1.26 27-Apr-2005  yamt uvmexp_print: print swpgavail as well.
 1.25 23-Nov-2004  yamt branches: 1.25.4;
introduce UVMHIST_LOANHIST and sprinkle UVMHIST_LOGs.
 1.24 01-May-2004  petrov Replace uvm counters with evcnt, initialize them through __link_set (from Matt Thomas),
disable counters by default and add configuration option UVMMAP_COUNTERS.
 1.23 24-Mar-2004  junyoung branches: 1.23.2;
Nuke __P().
 1.22 09-Dec-2001  chs branches: 1.22.16;
add {anon,file,exec}max as a upper bound on the amount of memory that
will be allocated for the respective usage types when there is contention
for memory.

replace "vnode" and "vtext" with "file" and "exec" in uvmexp field names
and sysctl names.
 1.21 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.20 15-Sep-2001  chs branches: 1.20.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.19 25-May-2001  chs branches: 1.19.2; 1.19.4;
remove trailing whitespace.
 1.18 09-Mar-2001  chs add UBC memory-usage balancing. we track the number of pages in use for
each of the basic types (anonymous data, executable image, cached files)
and prevent the pagedaemon from reusing a given page if that would reduce
the count of that type of page below a sysctl-setable minimum threshold.
the thresholds are controlled via three new sysctl tunables:
vm.anonmin, vm.vnodemin, and vm.vtextmin. these tunables are the
percentages of pageable memory reserved for each usage, and we do not allow
the sum of the minimums to be more than 95% so that there's always some
memory that can be reused.
 1.17 04-Feb-2001  mrg branches: 1.17.2;
allow ubchist to be printed from the uvmhist merging uvm_hist()
 1.16 01-Dec-2000  chs add new uvmexp fields for uvmexp_print().
 1.15 24-Nov-2000  chs add ddb commands "show uvmexp" and "show ncache".
the former used to be "call uvm_dump", the latter is new.
 1.14 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.13 11-Jan-2000  chs add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.12 26-Mar-1999  chs branches: 1.12.4; 1.12.8;
add uvmexp.swpgonly and use it to detect out-of-swap conditions.
 1.11 25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.10 20-Jun-1998  mrg branches: 1.10.4;
Add new history grovelling function uvm_hist() that takes a bitmask of
histories to merge in cronological order. currently, MAPHIST and
PDHIST are defined as 1 and 2 respectively. passing a bitmask of 0
to uvm_hist() will dump all maps.
 1.9 10-Mar-1998  chuck uvm_dump now dumps some important pointers for debugging
 1.8 13-Feb-1998  thorpej Oops, fix a typo.
 1.7 13-Feb-1998  thorpej KNF.
 1.6 13-Feb-1998  thorpej Add a global list of all UVM histories.
 1.5 12-Feb-1998  thorpej Provide a patchable knob (uvmhist_print_enabled) so that UVM history
buffer printing can be switched on and off at run-time. Only exists
if the kernel is build with UVMHIST_PRINT, and defaults to `on'.
 1.4 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.10.4.2 25-Feb-1999  chs print more stuff in uvm_dump().
 1.10.4.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.12.8.4 12-Mar-2001  bouyer Sync with HEAD.
 1.12.8.3 11-Feb-2001  bouyer Sync with HEAD.
 1.12.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.12.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.17.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.17.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.17.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.17.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.17.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.19.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.19.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.20.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.22.16.6 11-Dec-2005  christos Sync with head.
 1.22.16.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.22.16.4 29-Nov-2004  skrll Sync with HEAD.
 1.22.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.22.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.22.16.1 03-Aug-2004  skrll Sync with HEAD
 1.23.2.1 09-May-2004  jdc Pull up revision 1.24 (requested by petrov in ticket #269)

Replace uvm counters with evcnt, initialize them through __link_set (from Matt Thomas),
disable counters by default and add configuration option UVMMAP_COUNTERS.
 1.25.4.1 29-Apr-2005  kent sync with -current
 1.28.8.1 29-Nov-2005  yamt sync with head.
 1.28.2.2 30-Dec-2006  yamt sync with head.
 1.28.2.1 21-Jun-2006  yamt sync with head.
 1.29.20.1 18-Nov-2006  ad Sync with head.
 1.29.8.1 05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.30.60.1 19-Oct-2008  haad Sync with HEAD.
 1.30.58.2 27-Jun-2008  simonb Revert local changes that were not meant to be in previous "sync with
head" commit.
 1.30.58.1 27-Jun-2008  simonb Sync with head.
 1.30.56.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.30.54.2 11-Mar-2010  yamt sync with head
 1.30.54.1 04-May-2009  yamt sync with head.
 1.30.50.1 28-Sep-2008  mjf Sync with HEAD.
 1.31.12.4 12-Apr-2012  matt Separate object-less anon pages out of the active list if there is no swap
device. Make uvm_reclaimable and uvm.*estimatable understand colors and
kmem allocations.
 1.31.12.3 16-Feb-2012  matt Track the victims selected by the pagedaemon and what happens to then.
Keep a hint for what page group has the most free pages for a given color.
 1.31.12.2 09-Feb-2012  matt Major changes to uvm.
Support multiple collections (groups) of free pages and run the page
reclaimation algorithm on each group independently.
 1.31.12.1 22-Jan-2010  matt Print out colors in uvmexp_print
 1.32.4.2 31-May-2011  rmind sync with head
 1.32.4.1 05-Mar-2011  rmind sync with head
 1.35.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.35.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.37.36.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.37.32.1 05-Dec-2016  skrll Sync with HEAD
 1.37.14.1 03-Dec-2017  jdolecek update from HEAD
 1.39.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.39.4.1 10-Jun-2019  christos Sync with HEAD
 1.56 11-Dec-2021  mrg remove clause 3 from all my licenses that aren't conflicting with
another copyright claim line. again. (i did this in 2008 and then
did not update all of my personal templates.)
 1.55 17-Apr-2021  mrg remove KERNHIST_INIT_STATIC(). it stradles the line between usable
early in boot and broken early in boot by requiring a partly static
structure with another structure that must be present by the time
any uses are performed. theoretically platform code could allocate
a chunk while seting up memory and assign it here, giving a dynamic
sizing for the entry list, but the reality is that all users have
a statically allocated entry list as well.

the existing KERNHIST_LINK_STATIC() is used in conjunction with
KERNHIST_INITIALIZER() instead.

this stops a NULL pointer deref when the _LOG() macro is called
before the storage is linked in, which happens with GCC 10 on OCTEON
with UVMHIST enabled, crashing in very early kernel init.
 1.54 13-Apr-2020  skrll branches: 1.54.4;
Oops, forgot the empty macro version of UVMHIST_CALLARGS
 1.53 08-Apr-2020  skrll branches: 1.53.2;
Provide UVMHIST_CALLARGS
 1.52 05-Mar-2014  matt branches: 1.52.30;
Use UVMHIST_INITIALIZER (KERNHIST_INITIALIZER) to statically initialize
maphist. This allows maphist to used very very early in boot well before
uvm has been initialized.
 1.51 30-Jul-2012  matt branches: 1.51.2; 1.51.4;
-fno-common broke kernhist since it used commons.
Add a KERNHIST_DEFINE which is define the kernel history.
Change UVM to deal with the new usage.
 1.50 17-May-2011  mrg branches: 1.50.4;
move and rename the uvm history code out of uvm_stat to "kernhist".

rename "UVMHIST" option to enable the uvm histories.

TODO:
- make UVMHIST properly depend upon KERNHIST
- enable dynamic registration of histories. this is mostly just
allocating something in a bitmap, and is only for viewing multiple
histories in a merged form.


tested on amd64 and sparc64.
 1.49 23-Apr-2011  rmind Replace "malloc" in comments, remove unnecessary header inclusions.
 1.48 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.47 07-Jul-2010  chs branches: 1.47.2; 1.47.4;
switch the UVMHIST counters from mutexes to atomic ops
to avoid a bad interaction with DIAGNOSTIC.
 1.46 06-Feb-2010  uebayasi branches: 1.46.2; 1.46.4;
__inline -> inline
 1.45 01-Feb-2009  skrll Fix printing of tv_sec,tv_usec in UVMHIST.
 1.44 08-Aug-2008  skrll branches: 1.44.2;
Make "show uvmhist" available to all arches (not just sparc*) in ddb.
 1.43 25-May-2008  chs branches: 1.43.2; 1.43.4;
if UVMHIST is defined, include headers necessary for its use.
 1.42 27-Feb-2008  ad branches: 1.42.2; 1.42.4; 1.42.6;
Minor corrections to comments.
 1.41 22-Jan-2008  reinoud branches: 1.41.2; 1.41.6;
Remove extra '(' that prevented kernel with UVMHIST to be compiled
 1.40 02-Jan-2008  ad Merge vmlocking2 to head.
 1.39 16-Feb-2006  perry branches: 1.39.24; 1.39.40; 1.39.46; 1.39.50; 1.39.54;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.38 24-Dec-2005  perry branches: 1.38.2; 1.38.4; 1.38.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.37 11-Dec-2005  christos merge ktrace-lwp.
 1.36 01-Jun-2005  drochner branches: 1.36.2;
prepend an underscore to local variables in macros, to avoid shadowing
user defined ones
 1.35 23-Nov-2004  yamt UVMHIST_LOG: avoid division.
 1.34 23-Nov-2004  yamt constify.
 1.33 23-Nov-2004  yamt introduce UVMHIST_LOANHIST and sprinkle UVMHIST_LOGs.
 1.32 01-May-2004  petrov Replace uvm counters with evcnt, initialize them through __link_set (from Matt Thomas),
disable counters by default and add configuration option UVMMAP_COUNTERS.
 1.31 29-Apr-2004  enami Make strlen calls to be folded to constant at compile time.
 1.30 23-Apr-2004  simonb s/this this/this/.
 1.29 24-Mar-2004  junyoung branches: 1.29.2;
Nuke __P().
 1.28 24-Jan-2004  dbj rearrange struct uvm_history to put the struct simplelock at the end.
This avoids problems with the kernel grovelling vmstat -u/-U when
using LOCKDEBUG, which changes the size of struct simplelock.
Replaced the original location of the simplelock with "int unused"
so that binary compatibility will be retained with old vmstat.
 1.27 08-Mar-2003  tsutsui branches: 1.27.2;
Use cpu_number() in UVMHIST_LOG() rather than non-public ci_cpuid member
in struct cpu_info.
 1.26 09-Feb-2003  pk Include CPU number in UVM history logs.
 1.25 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.24 05-Mar-2002  simonb branches: 1.24.2;
Include <sys/kernel.h> if UVMHIST is defined - the "cold" variable is
used in the UVMHIST_LOG macro.
Breakage reported by Chuck Silvers in private mail.
 1.23 04-Mar-2002  simonb Don't "extern int cold;" - this is in <sys/kernel.h>.
 1.22 30-May-2001  mrg branches: 1.22.2;
use _KERNEL_OPT
 1.21 26-May-2001  chs replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.
 1.20 25-May-2001  chs remove trailing whitespace.
 1.19 04-Feb-2001  mrg branches: 1.19.2;
allow ubchist to be printed from the uvmhist merging uvm_hist()
 1.18 11-Apr-2000  pk Finish previous.
 1.17 11-Apr-2000  chs avoid declarating "i" as a local variable in a macro.
it's too easy to shadow another local.
 1.16 30-Mar-2000  augustss Remove more register declarations.
 1.15 21-Jun-1999  thorpej branches: 1.15.2;
Protect prototypes, certain macros, and inlines from userland.
 1.14 25-Mar-1999  mrg branches: 1.14.4;
remove now >1 year old pre-release message.
 1.13 09-Aug-1998  perry branches: 1.13.2;
bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.12 20-Jun-1998  mrg Add new history grovelling function uvm_hist() that takes a bitmask of
histories to merge in cronological order. currently, MAPHIST and
PDHIST are defined as 1 and 2 respectively. passing a bitmask of 0
to uvm_hist() will dump all maps.
 1.11 09-Mar-1998  mrg KNF.
 1.10 13-Feb-1998  thorpej KNF.
 1.9 13-Feb-1998  thorpej A few changes to make it possible to read UVM histories from userland:
- Protect option headers from inclusion if ! _KERNEL or if _LKM.
- Make sure struct uvm_history is always the same size (not dependent
on NCPU).
- Add fmtlen and fnlen members to struct uvm_history_ent, which specify
the lengths fo the fmt and fn strings.
- Add name, namelen, and a list entry to struct uvm_history.
- When a history is initialized, place it on the global list of all histories.
 1.8 12-Feb-1998  thorpej Provide a patchable knob (uvmhist_print_enabled) so that UVM history
buffer printing can be switched on and off at run-time. Only exists
if the kernel is build with UVMHIST_PRINT, and defaults to `on'.
 1.7 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.6 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.5 08-Feb-1998  mrg turn of UVM history logging by default.
 1.4 07-Feb-1998  mrg restore rcsids
 1.3 07-Feb-1998  chs remove locking from UVMCNT counters.
they don't need to be exact, and the locking causes problems
in some of places they're used.
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.13.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.14.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.14.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.15.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.15.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.19.2.3 11-Nov-2002  nathanw Catch up to -current
 1.19.2.2 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.19.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.22.2.1 16-Mar-2002  jdolecek Catch up with -current.
 1.24.2.1 12-Mar-2002  thorpej Make the UVM history buffer lock a spin mutex at IPL_HIGH.
 1.27.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.27.2.4 29-Nov-2004  skrll Sync with HEAD.
 1.27.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.27.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.27.2.1 03-Aug-2004  skrll Sync with HEAD
 1.29.2.1 09-May-2004  jdc Pull up revision 1.32 (requested by petrov in ticket #269)

Replace uvm counters with evcnt, initialize them through __link_set (from Matt Thomas),
disable counters by default and add configuration option UVMMAP_COUNTERS.
 1.36.2.3 17-Mar-2008  yamt sync with head.
 1.36.2.2 04-Feb-2008  yamt sync with head.
 1.36.2.1 21-Jan-2008  yamt sync with head
 1.38.6.1 22-Apr-2006  simonb Sync with head.
 1.38.4.1 09-Sep-2006  rpaulo sync with head
 1.38.2.1 18-Feb-2006  yamt sync with head.
 1.39.54.2 23-Jan-2008  bouyer Sync with HEAD.
 1.39.54.1 02-Jan-2008  bouyer Sync with HEAD
 1.39.50.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.39.46.1 18-Feb-2008  mjf Sync with HEAD.
 1.39.40.2 23-Mar-2008  matt sync with HEAD
 1.39.40.1 09-Jan-2008  matt sync with HEAD
 1.39.24.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.41.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.41.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.41.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.41.2.1 24-Mar-2008  keiichi sync with head.
 1.42.6.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.42.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.42.4.3 11-Aug-2010  yamt sync with head.
 1.42.4.2 11-Mar-2010  yamt sync with head
 1.42.4.1 04-May-2009  yamt sync with head.
 1.42.2.1 04-Jun-2008  yamt sync with head
 1.43.4.1 19-Oct-2008  haad Sync with HEAD.
 1.43.2.2 27-Jun-2008  simonb Revert local changes that were not meant to be in previous "sync with
head" commit.
 1.43.2.1 27-Jun-2008  simonb Sync with head.
 1.44.2.1 25-Feb-2009  skrll Sync with HEAD.
 1.46.4.2 31-May-2011  rmind sync with head
 1.46.4.1 05-Mar-2011  rmind sync with head
 1.46.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.47.4.1 08-Feb-2011  bouyer Sync with HEAD
 1.47.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.50.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.50.4.1 30-Oct-2012  yamt sync with head
 1.51.4.1 18-May-2014  rmind sync with head
 1.51.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.52.30.2 21-Apr-2020  martin Sync with HEAD
 1.52.30.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.53.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.54.4.1 17-Apr-2021  thorpej Sync with HEAD.
 1.209 22-Feb-2025  mlelstv Keep b_resid consistent on I/O errors.
 1.208 09-Apr-2023  riastradh branches: 1.208.6;
uvm(9): KASSERT(A && B) -> KASSERT(A); KASSERT(B)
 1.207 21-Dec-2022  chs swap: disallow user opens of swap block device

the swap/drum block device was never intended to allow user opens,
but when the internal VOP_OPEN() in uvm_swap_init() was added
back in rev 1.135, the d_open method was changed from always-fail
to always-succeed in order to allow the new initial internal open.
this had the side effect of incorrectly allowing user opens too.
fix this by replacing the swap_bdevsw d_open with one that succeeds
for the first call but fails for all subsequent calls.

Reported-by: syzbot+90a23d2f19e5a0a302b3@syzkaller.appspotmail.com
 1.206 23-Aug-2021  hannken branches: 1.206.4;
Return immediately from uvm_swap_shutdown() if there are
no (more) swap devices configured.
 1.205 03-Jun-2021  riastradh uvm(9): Enable swap encryption by default.

For machines where the performance impact of swapping before the
system has an opportunity to process `vm.swap_encrypt=0' in
/etc/sysctl.conf, you can disable it again by adding

options VMSWAP_DEFAULT_PLAINTEXT

to the kernel config.
 1.204 23-May-2021  mrg branches: 1.204.2;
avoid taking locks that aren't initialised.

fixes panic when typing 'reboot' at the askroot prompt.
 1.203 13-Mar-2021  skrll branches: 1.203.4; 1.203.6;
Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats
 1.202 19-Feb-2021  hannken When turning off swap during reboot we have to lock with LK_RETRY
as regular files got reclaimed during unmount.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)
 1.201 16-Feb-2021  hannken Reorganize uvm_swap_shutdown() a bit, make sure the vnode gets
locked and referenced across the call to swap_off() and finally
use it from vfs_unmountall1() to remove swap after unmounting
the last file system.

Adresses PR kern/54969 (Disk cache is no longer flushed on shutdown)
 1.200 07-Oct-2020  chs branches: 1.200.2;
Add a new, more aggressive allocator for uvm_pglistalloc() to allocate
contiguous physical pages, and try this new allocator if the existing
one fails. The existing contig allocator only tries to allocate pages
that are already free, which works fine shortly after boot but rarely
works after the system has been up for a while. The new allocator uses
the pagedaemon to evict pages from memory in the hope that this will
free up a range of pages that satisfies the constraits of the request.
This should help with things like plugging in a USB device, which often
fails for some USB controllers because they can't get contigous memory.
 1.199 29-Sep-2020  msaitoh s/parition/partition/
 1.198 25-Jul-2020  riastradh Split aes_cbc_* and aes_xts_* into their own header files.

aes.h will remain just for key setup; any particular construction using
AES can have its own header file so we can have many of them without
rebuilding everything AES-related whenever one of them changes.

(Planning to add AES-CCM and AES-GCM too.)
 1.197 09-Jul-2020  skrll Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
 1.196 08-Jul-2020  skrll Trailing whitespace
 1.195 29-Jun-2020  riastradh uvm: Make sure swap encryption IV is 128-bit-aligned on stack.

Will help hardware-assisted AES.
 1.194 29-Jun-2020  riastradh uvm(9): Switch from legacy rijndael API to new aes API.
 1.193 24-May-2020  jdolecek fix KASAN PoolUseAfterFree for async write - can't read bp after VOP_STRATEGY()

problem found and fix provided by Paul Ripke
 1.192 22-May-2020  jdolecek DRY code in uvm_swap_io() for the write loop
 1.191 21-May-2020  riastradh Let's not waste time decrypting garbage, shall we?

Skip to the end if the transfer failed.
 1.190 20-May-2020  riastradh Make swap encryption MP-safe.

Not entirely sure the rest of the swap system is MP-safe, but let's
not make it worse!

XXX Why is swap_syscall_lock an rwlock? We don't seem to take the
reader lock ever.
 1.189 10-May-2020  riastradh Rename things so the symbol better matches the sysctl name.

No functional change intended, except that the symbol that was
previously `uvm_swap_encryption' is now `uvm_swap_encrypt', backing
the sysctl knob `vm.swap_encrypt'.
 1.188 09-May-2020  riastradh Avoid overflow if a very large number of pages are swapped at once.

Unlikely, but let's make sure we don't hit this ever.
 1.187 09-May-2020  riastradh Implement swap encryption.

Enabled by sysctl -w vm.swap_encrypt=1. Key is generated lazily when
we first need to swap a page. Key is chosen independently for each
swap device. The ith swap page is encrypted with AES256-CBC using
AES256_k(le32enc(i) || 0^96) as the initialization vector. Can be
changed at any time; no need for compatibility with on-disk formats.
Costs one bit of memory per page in each swapdev, plus a few hundred
bytes per swapdev to store the expanded AES key.

Shoulda done this decades ago! Plan to enable this by default;
performance impact is unlikely to matter because it only happens when
you're already swapping anyway. Much easier to set up than cgd, so
we can rip out all the documentation about carefully setting up
random-keyed cgd at the right time.
 1.186 18-Feb-2020  chs remove the aiodoned thread. I originally added this to provide a thread context
for doing page cache iodone work, but since then biodone() has changed to
hand off all iodone work to a softint thread, so we no longer need the
special-purpose aiodoned thread.
 1.185 27-Dec-2019  msaitoh branches: 1.185.2;
s/transfered/transferred/
 1.184 14-Dec-2019  ad Update uvmexp.nswget with atomics.
 1.183 01-Dec-2019  uwe Add missing #include <sys/atomic.h>
 1.182 01-Dec-2019  ad - Adjust uvmexp.swpgonly with atomics, and make uvm_swap_data_lock static.
- A bit more __cacheline_aligned on mutexes.
 1.181 06-Oct-2019  mlelstv Defer to synchronous I/O before the aiodone work queue exists.
 1.180 27-Jan-2019  kre Remove end of line spaces - one (two in one line) added during recent merge,
one older.
 1.179 27-Jan-2019  pgoyette Merge the [pgoyette-compat] branch
 1.178 15-Nov-2018  maxv Woah man, fix enormous leak.

Possible info leak: [len=1056, leaked=931]
#0 0xffffffff80bad351 in kleak_copyout
#1 0xffffffff80b2cf64 in uvm_swap_stats.part.1
#2 0xffffffff80b2d38d in uvm_swap_stats
#3 0xffffffff80b2d43c in sys_swapctl
#4 0xffffffff80259b82 in syscall
 1.177 15-Mar-2018  christos branches: 1.177.2;
finish moving the compat code out.
 1.176 15-Mar-2018  christos Untangle the swapctl compat code mess. Welcome to lucky 13.
 1.175 28-Oct-2017  pgoyette branches: 1.175.2;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.174 08-Jul-2016  skrll branches: 1.174.2; 1.174.8; 1.174.10;
Remove '\n' from UVMHIST_LOG format - it is not needed.
 1.173 30-Jul-2015  maxv Lock before calling uvm_swap_stats(). Otherwise a race condition could
corrupt memory.
 1.172 25-Jul-2014  dholland branches: 1.172.2; 1.172.4; 1.172.6; 1.172.10;
Add d_discard to all struct cdevsw instances I could find.

All have been set to "nodiscard"; some should get a real implementation.
 1.171 25-Jul-2014  dholland Add d_discard to all struct bdevsw instances I could find.

I've set them all to nodiscard. Some of them (wd, dk, vnd, ld,
raidframe, maybe cgd) should be implemented for real.
 1.170 28-Jun-2014  maxv This KASSERT can trigger a panic too easily, if SCARG(uap, cmd)=SWAP_OFF and
SCARG(uap, arg)=NULL. The same KASSERT is already in the SWAP_ON switch case,
so just delete it here.
 1.169 22-Jun-2014  maxv Sync swapctl() with netbsd32. Return EINVAL when misc<0, and 0 when misc=0
or uvmexp.nswapdev=0.
 1.168 16-Mar-2014  dholland branches: 1.168.2;
Change (mostly mechanically) every cdevsw/bdevsw I can find to use
designated initializers.

I have not built every extant kernel so I have probably broken at
least one build; however I've also found and fixed some wrong
cdevsw/bdevsw entries so even if so I think we come out ahead.
 1.167 22-Feb-2014  mlelstv Drop empty priority lists, not the full ones. Fixes kern/48611.
 1.166 03-Feb-2014  manu Properly translate struct swapent for COMPAT_NETBSD32
 1.165 23-Nov-2013  christos fix circleq comments
 1.164 23-Nov-2013  christos convert from CIRCLEQ to TAILQ
add uvm_swap_shutdown(), unused
 1.163 07-May-2013  riastradh branches: 1.163.4;
Set bp->b_resid to bp->b_bcount on error in swstrategy as required.
 1.162 27-Nov-2012  jakllsch Until such time as the swap subsystem can be converted to use The One True
Allocator, prevent panics if (MAXPHYS/PAGE_SIZE) > BLIST_MAX_ALLOC.
From Wolfgang Stukenbrock in PR#41765.
 1.161 05-Feb-2012  rmind branches: 1.161.2; 1.161.6;
- sys_swapctl: validate the number of swap devices argument for SWAP_STATS.
- uvm_swap_stats: fix a buffer overrun, add some asserts.

Reviewed by mrg@
 1.160 28-Jan-2012  rmind pool_page_alloc, pool_page_alloc_meta: avoid extra compare, use const.
ffs_mountfs,sys_swapctl: replace memset with kmem_zalloc.
sys_swapctl: move kmem_free outside the lock path.
uvm_init: fix comment, remove pointless numeration of steps.
uvm_map_enter: remove meflagval variable.
Fix some indentation.
 1.159 27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.158 12-Dec-2011  mrg implement bdev_size(9) wrapper around d_psize() routine, so we can take
the device lock in relevant places. avoid doing so while actually dumping.

tested i386 crash dumps still work, and that all touched files compile.

fixes PR#45705.
 1.157 02-Sep-2011  dyoung branches: 1.157.2; 1.157.6;
Report vmem(9) errors out-of-band so that we can use vmem(9) to manage
ranges that include the least and the greatest vmem_addr_t. Update
vmem(9) uses throughout the kernel. Slightly expand on the tests in
subr_vmem.c, which still pass. I've been running a kernel with this
patch without any trouble.
 1.156 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.155 27-Apr-2011  rmind branches: 1.155.2;
Remove public uvm_swap_stats() routine, keep it internal.
 1.154 23-Apr-2011  rmind Replace "malloc" in comments, remove unnecessary header inclusions.
 1.153 19-Nov-2010  dholland branches: 1.153.2;
Introduce struct pathbuf. This is an abstraction to hold a pathname
and the metadata required to interpret it. Callers of namei must now
create a pathbuf and pass it to NDINIT (instead of a string and a
uio_seg), then destroy the pathbuf after the namei session is
complete.

Update all namei call sites accordingly. Add a pathbuf(9) man page and
update namei(9).

The pathbuf interface also now appears in a couple of related
additional places that were passing string/uio_seg pairs that were
later fed into NDINIT. Update other call sites accordingly.
 1.152 09-Jul-2010  hannken Replace vget() with vref()/vn_lock(), this node already has a reference.
 1.151 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.150 02-Mar-2010  pooka branches: 1.150.2;
For the nfs throttling kludge, test against v_tag == VT_NFS instead
of v_op (the latter imposes linkage).
 1.149 07-Feb-2010  mlelstv branches: 1.149.2;
Use filesystem blocks to address filesystem objects. f_iosize just
happens to be the same for current filesystems.
 1.148 02-Feb-2010  wiz Missing 'if defined COMPAT13 or COMPAT50' in uvm_swap.c found by cppcheck
and reported by Henning Petersen in PR 42721.
 1.147 21-Oct-2009  rmind Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.146 13-Sep-2009  pooka Wipe out the last vestiges of POOL_INIT with one swift stroke. In
most cases, use a proper constructor. For proplib, give a local
equivalent of POOL_INIT for the kernel object implementation. This
way the code structure can be preserved, and a local link set is
not hazardous anyway (unless proplib is split to several modules,
but that'll be the day).

tested by booting a kernel in qemu and compile-testing i386/ALL
 1.145 01-Mar-2009  mrg fix some messages function names that are wrong by using __func__.
 1.144 14-Jan-2009  mrg branches: 1.144.2;
catch up with dev_t becoming 64 bit:

- move struct oswapent into uvm_swap.c proper, calling it swapent13
- introduce a new struct swapent50, also only in uvm_swap.c
- stop using struct oswapent inside struct swapent, or struct swapdev
- rename SWAP_OSTATS SWAP_STATS13
- rename SWAP_STATS SWAP_STATS50
- add new SWAP_STATS
- rewrite the handling for SWAP_STATS13, SWAP_STATS50 and SWAP_STATS
 1.143 13-Jan-2009  yamt g/c BUFQ_FOO() macros and use bufq_foo() directly.
 1.142 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.141 13-Dec-2008  ad PR kern/40027 pagedaemon loops on memory shortage

uvm_swapisfull: don't count some small portion as it may be inaccessible to
us at any given moment, for example if there is lock contention or if pages
are busy.
 1.140 23-Sep-2008  ad branches: 1.140.2; 1.140.4;
Move test for __SWAP_BROKEN here.
 1.139 29-May-2008  mrg branches: 1.139.4;
remove clause #3 from my license where there are no other
copyright holders involved.
 1.138 11-May-2008  kardel keep dumpcdev and dumpdev consistent
allows savecore.c@1.72 to find the right dumpdev in case it was changed
from the default - hi ad@
 1.137 29-Feb-2008  yamt branches: 1.137.2; 1.137.4; 1.137.6;
uvm_swap_io: if pagedaemon, don't wait for iobuf.
 1.136 30-Jan-2008  hannken branches: 1.136.2; 1.136.6;
Lock swapdev_vp for VOP_OPEN.

From: YAMAMOTO Takashi <yamt@netbsd.org>
 1.135 27-Jan-2008  hannken uvm_swap_init(): Call VOP_OPEN() on swapdev_vp to make I/O through the
swap device work with specnodes.

Ok: Andrew Doran <ad@netbsd.org>
 1.134 02-Jan-2008  ad Merge vmlocking2 to head.
 1.133 20-Dec-2007  dsl Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
 1.132 08-Dec-2007  pooka branches: 1.132.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.131 26-Nov-2007  pooka branches: 1.131.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.130 15-Oct-2007  hannken branches: 1.130.4;
When swapping to a regular file use a workqueue to signal I/O completion.

VOP_STRATEGY() no longer gets called from interrupt context via
biodone() -> sw_reg_iodone() -> sw_reg_start().

Removes a deadlock condition reported in PR 37109.

Ok: YAMAMOTO Takashi <yamt@netbsd.org>
 1.129 29-Jul-2007  ad branches: 1.129.4; 1.129.6; 1.129.8; 1.129.10;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.128 24-Jul-2007  ad In order to pacify assertions, make uao_list_lock + uvm_swap_data_lock
spinlocks for the time being.
 1.127 21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.126 09-Jul-2007  ad branches: 1.126.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.125 15-Jun-2007  ad Add a sysctl to disable swapout of kernel stacks. Discussed on tech-kern@.
 1.124 22-Apr-2007  dsl Change the way that emulations locate files within the emulation root to
avoid having to allocate space in the 'stackgap'
- which is very LWP unfriendly.
The additional code for non-emulation namei() is trivial, the reduction for
the emulations is massive.
The vnode for a processes emulation root is saved in the cwdi structure
during process exec.
If the emulation root the TRYEMULROOT flag are set, namei() will do an initial
search for absolute pathnames in the emulation root, if that fails it will
retry from the normal root.
".." at the emulation root will always go to the real root, even in the middle
of paths and when expanding symlinks.
Absolute symlinks found using absolute paths in the emulation root will be
relative to the emulation root (so /usr/lib/xxx.so -> /lib/xxx.so links
inside the emulation root don't need changing).
If the root of the emulation would be returned (for an emulation lookup), then
the real root is returned instead (matching the behaviour of emul_lookup,
but being a cheap comparison here) so that programs that scan "../.."
looking for the root dircetory don't loop forever.
The target for symbolic links is no longer mangled (it used to get the
CHECK_ALT_xxx() treatment, so could get /emul/xxx prepended).
CHECK_ALT_xxx() are no more. Most of the change is deleting them, and adding
TRYEMULROOT to the flags to NDINIT().
A lot of the emulation system call stubs could now be deleted.
 1.123 12-Mar-2007  ad branches: 1.123.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.122 04-Mar-2007  christos branches: 1.122.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.121 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.120 22-Feb-2007  matt Fix lossage from boolean_t -> bool and updated x86 bus_dma.
 1.119 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.118 19-Feb-2007  ad uvm_kick_scheduler(): do nothing until the swap subsystem is initialized.
 1.117 15-Feb-2007  ad branches: 1.117.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).
 1.116 09-Feb-2007  ad Merge newlock2 to head.
 1.115 07-Dec-2006  elad Back out uvm_is_swap_device().
 1.114 02-Dec-2006  elad We are required to hold uvm.swap_data_lock here too.
 1.113 01-Dec-2006  elad branches: 1.113.2;
Introduce uvm_is_swap_device(), to check if the passed struct vnode * is
used as a swap device or not.

Okay mrg@.
 1.112 01-Nov-2006  yamt remove some __unused from function parameters.
 1.111 27-Oct-2006  yamt revert malloc -> kmem_alloc part of uvm_swap.c rev.1.110 because
the current implementation of kmem_free can sleep.
 1.110 22-Oct-2006  yamt extent/malloc -> vmem_alloc/kmem_alloc.
 1.109 21-Oct-2006  mrg in cpu_dumpconf(), don't panic() if we can't bdevsw_lookup() the
dumpdev. this occurs when we try to set the dumpdev to a device
with no driver loaded. this fixes PR#34872.

in sys_swapctl, if bdevsw_lookup() fails, set dumpdev = NODEV
before calling cpu_dumpconf(). (this also fixes PR#34872.)

XXX: cpu_dumpconf() should probably be changed to take a dumpdev
XXX: and return an error in such cases, but that is a much more
XXX: intrusive change.

XXX2: this is only run-tested on sparc64 and compile tested on a
XXX2: couple of platforms.
 1.108 12-Oct-2006  thorpej uvm_swap_stats_locked(): Consume the cmd argument even if COMPAT_13 is
not defined.
 1.107 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.106 08-Sep-2006  elad branches: 1.106.2;
First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)
 1.105 03-Sep-2006  christos branches: 1.105.2;
add missing initializers
 1.104 22-Aug-2006  martin Add a new swapctl(2) command to unset the dump device.
 1.103 21-Jul-2006  ad - Use the LWP cached credentials where sane.
- Minor cosmetic changes.
 1.102 13-Jun-2006  christos prevent uninitialized variable.
 1.101 12-Jun-2006  christos Don't allocate > 1K on the stack.
 1.100 14-May-2006  elad branches: 1.100.2;
integrate kauth.
 1.99 21-Jan-2006  matt branches: 1.99.2; 1.99.4; 1.99.6; 1.99.8; 1.99.10;
Fix u_int64_t -> uint64_t stragglers.
 1.98 04-Jan-2006  yamt - add simple functions to allocate/free a buffer for i/o.
- make bufpool static.
 1.97 11-Dec-2005  christos branches: 1.97.2;
merge ktrace-lwp.
 1.96 15-Oct-2005  yamt - change the way to specify a bufq strategy. (by string rather than by number)
- rather than embedding bufq_state in driver softc,
have a pointer to the former.
- move bufq related functions from kern/subr_disk.c to kern/subr_bufq.c.
- rename method to strategy for consistency.
- move some definitions which don't need to be exposed to the rest of kernel
from sys/bufq.h to sys/bufq_impl.h.
(is it better to move it to kern/ or somewhere?)
- fix some obvious breakage in dev/qbus/ts.c. (not tested)
 1.95 17-Sep-2005  yamt - make uvm_swap_stats acquire swap_syscall_lock by itsself
so that callers don't need to acquire it beforehand.
- make swap_syscall_lock static.
 1.94 27-Jun-2005  thorpej branches: 1.94.2;
Sprinkle some static.
 1.93 27-Jun-2005  thorpej Use ANSI function decls.
 1.92 29-May-2005  christos avoid shadow variables.
remove unneeded casts.
 1.91 11-May-2005  yamt allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.90 06-Apr-2005  yamt switch swap space allocation code to use blist instead of extent(9).
fix "warning: resource shortage: %d pages of swap lost".

extent(9) has some undesirable characteristics for swap allocation:
- it involves alloc-to-free.
- its operational cost is O(n*n) where n is number of entries.
 1.89 28-Oct-2004  yamt branches: 1.89.4; 1.89.10;
move buffer queue related stuffs from buf.h to their own header, bufq.h.
 1.88 14-May-2004  christos don't accept a negative number of swap devices; it will attempt to malloc
something very large and might crash the kernel; From Evgeny Demidov
 1.87 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.86 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.85 24-Mar-2004  junyoung branches: 1.85.2;
- Nuke __P().
- Drop trailing spaces.
 1.84 25-Jan-2004  hannken Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.83 10-Jan-2004  yamt store a i/o priority hint in struct buf for buffer queue discipline.
 1.82 28-Aug-2003  pk When retiring a swap device with marked bad blocks on it we should update
the `# swap page in use' and `# swap page only' counters. However, at the
time of swap device removal we can no longer figure out how many of the
bad swap pages are actually also `swap only' pages.

So, on swap I/O errors arrange things to not include the bad swap pages in
the `swpgonly' counter as follows: uvm_swap_markbad() decrements `swpgonly'
by the number of bad pages, and the various VM object deallocation routines
do not decrement `swpgonly' for swap slots marked as SWSLOT_BAD.
 1.81 11-Aug-2003  pk Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.
 1.80 29-Jun-2003  fvdl branches: 1.80.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.79 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.78 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.77 25-Feb-2003  thorpej Add a new BUF_INIT() macro which initializes b_dep and b_interlock, and
use it. This fixes a few places where either b_dep or b_interlock were
not properly initialized.
 1.76 05-Feb-2003  pk Make the buffer cache code MP-safe.
 1.75 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.74 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.73 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.72 27-Oct-2002  chs examine the B_ERROR flag instead of the b_error field to determine
whether or not an error has occured. pointed out by Stephan Uphoff.
 1.71 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.70 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.69 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.68 31-Aug-2002  drochner call cpu_dumpconf() after dumpdev change, so that
the global dumpsize/dumplo get updated
 1.67 27-Jul-2002  chs allocate the bufq after zeroing the swapdev structure, not before.
 1.66 21-Jul-2002  hannken Rename bufq_init() to bufq_alloc().
Add bufq_free() to remove a buffer queue.
Avoid MALLOC while holding a spinlock.

From Chuck Silvers.
 1.65 19-Jul-2002  hannken Convert to new device buffer queue interface.
 1.64 09-May-2002  fredette branches: 1.64.2; 1.64.4;
When preparing to swap to a miniroot partition, add a little
padding to our estimate of the miniroot's size, to avoid
overwriting it.
 1.63 01-Apr-2002  manu Updated comment to reflect the creation of uvm_swap_stats()
 1.62 26-Mar-2002  manu Don't allocate struct swapent when we only need a struct oswapent.
 1.61 18-Mar-2002  manu Move swapctl(SWAP_STATS) implementation to a separate function called
uvm_swap_stats(). This is done in order to allow COMPAT_* swapctl()
emulation to use it directly without going through sys_swapctl().

The problem with using sys_swapctl() there is that it involves
copying the swapent array to the stackgap, and this array's size
is not known at build time. Hence it would not be possible to
ensure it would fit in the stackgap in any case.
 1.60 09-Mar-2002  thorpej branches: 1.60.2;
Remove PR_MALLOCOK.
 1.59 08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.58 16-Dec-2001  enami G/C no longer used saved credential for file i/o.
 1.57 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.56 06-Nov-2001  chs add an assert and rename some variables.
 1.55 01-Nov-2001  chs allow SWAP_GETDUMPDEV for all users.
use {LIST,TAILQ}_FOREACH where appropriate.
 1.54 15-Sep-2001  chs branches: 1.54.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.53 26-Aug-2001  chs branches: 1.53.2;
don't mess with vnode holds or buffer lists for swap i/os.
fixes problems with leaked vnode holds.
 1.52 26-May-2001  chs branches: 1.52.2;
replace {simple_,}lock{_data,}_t with struct {simple,}lock {,*}.
 1.51 25-May-2001  chs remove trailing whitespace.
 1.50 15-May-2001  ross Eliminate lhs cast (incorrectly accepted by gcc)
 1.49 09-May-2001  thorpej Use pool_init() rather than pool_create().
 1.48 09-May-2001  fvdl Avoid potential cases of sleeping while holding a spinlock. Pay attention
to SWF_FAKE when finding a swap device. GC swapdrum_add; it was only
a few lines long and called once, so just inline the code there.
 1.47 10-Mar-2001  chs eliminate the VM_PAGER_* error codes in favor of the traditional E* codes.
the mapping is:

VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART

for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
 1.46 18-Feb-2001  chs branches: 1.46.2;
clean up DIAGNOSTIC checks, use KASSERT().
 1.45 12-Feb-2001  pk SWAP_DUMPDEV,SWAP_OFF cases: make sure to release the vnode being operated on.
 1.44 04-Jan-2001  enami Use cast where appropriate to avoid integer overflow.
 1.43 27-Dec-2000  chs when we fail to allocate anons to represent new swap space,
just return an error rather than panicing.
 1.42 23-Dec-2000  enami Place a name of extent in a struct swapdev instead of dynamically
allocating it.
 1.41 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.40 17-Nov-2000  mrg add SWAP_GETDUMPDEV command support.
 1.39 13-Nov-2000  chs in swap_off(), reverse the order of vrele() and VOP_CLOSE() so that
devices will actually be notified if this is the last close.
this allows raidframe swap devices to be marked clean.
also, move the corresponding vref() into swap_on() for symmetry
and improve some comments.
 1.38 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.37 19-May-2000  thorpej branches: 1.37.4;
Tell uvm_pagermapin() the direction of the I/O so that it can map
with only the protection that it needs.
 1.36 15-Apr-2000  mrg remove <vm/vm_swap.h> and <vm/vm_conf.h>
 1.35 07-Apr-2000  chs restore a brelvp() that I removed in a moment of overzealousness.
Debugged by: Brian Grayson <bgrayson@netbsd.org>
 1.34 07-Feb-2000  thorpej Fix a bug in disksort_*() which caused non-optimal ordering when multiple
active partitions were on a single spindle. Add a b_rawblkno member to
struct buf which contains the non-partition-relative block number to sort
by.
 1.33 21-Jan-2000  thorpej Update for sys/buf.h/disksort_*() changes.
 1.32 11-Jan-2000  chs add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.31 04-Jan-2000  wrstuden Revert rev 1.28 -> 1.29. The VOP_CLOSE call was happeneing with the vnode
already locked, so don't lock it here.
 1.30 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.29 16-Oct-1999  wrstuden branches: 1.29.2; 1.29.4;
In spec_close(), if we're not doing a non-blocking close and VXLOCK is
not set, unlock the vnode before calling the device's close routine and
relock it after it returns. tty close routines will sleep waiting for
buffers to drain, which won't happen often times as the other side needs
to grab the vnode lock first.

Make all unmount routines lock the device vnode before calling VOP_CLOSE().
 1.28 22-Jul-1999  thorpej branches: 1.28.2;
Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.27 30-Mar-1999  chs branches: 1.27.2; 1.27.4;
remove some old #if 0'd-out debugging code.
 1.26 26-Mar-1999  chs add uvmexp.swpgonly and use it to detect out-of-swap conditions.
 1.25 18-Mar-1999  chs VHOLD() must be called at splbio() since HOLDRELE() is called
from the iodone handler.
 1.24 23-Feb-1999  mrg handle SWAP_DUMPDEV
 1.23 26-Dec-1998  marc When a reference is made to a hole in a swap file, panic. The optimal
thing would be to allocate the block, but I don't know how to do this.
The panic is preferable to the random memory corruption the old code
was causing.
 1.22 08-Nov-1998  mycroft branches: 1.22.2;
Clear B_NOCACHE when we're done with the buffer -- although this is probably
pointless.
 1.21 08-Nov-1998  mycroft Set the B_NOCACHE bit so that NFSv3 will not try to do async writes.
 1.20 18-Oct-1998  chs shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.
 1.19 06-Sep-1998  pk Panic instead failing the syscall on an impossible condition (from Robert Elz).
Plug possible memory leakage with the recently added device path stuff.
 1.18 30-Aug-1998  enami Define `len' as size_t rather than int so that correct type is passed
as fourth argument of copystr.
 1.17 29-Aug-1998  mrg move <vm/vm_swap.h> to <sys/swap.h>. <vm/vm_swap.h> still works for now (goes away later)
 1.16 29-Aug-1998  mrg add a `char se_path[PATH_MAX]' member to struct swapent, that
the pathname of the swap device is saved into. add a char *swd_path
member to struct swapdev, that contains a copy of the pathname
(using malloc(9)). rename swapctl(2)'s SWAP_STATS to SWAP_OSTATS,
and add a new SWAP_STATS command (number). make swapctl(SWAP_STATS,
...) [new version] copy the path out. if COMPAT_13, also include
support for SWAP_OSTATS. also fix a minor bug in swapctl(2).

the point of this is that swapfiles are now shown in `swapctl -l'.
 1.15 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.14 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.13 24-Jul-1998  thorpej branches: 1.13.2;
Put back swap_data_lock, which was apparently deleted accidentally during
the last round of changes. (I noticed it because I run my kernels w/
LOCKDEBUG.)
 1.12 23-Jul-1998  pk Use memory pools to allocate swap buffers. Allocations are all dynamic;
in particular `nswbuf' is gone, as is the private "struct buf" list that
was previously maintained in here.
 1.11 08-Jul-1998  pk Make sure to release buffers only once.
 1.10 17-Jun-1998  ross Correct an expression that tried to compute the swap size in bytes using
an int object, this sometimes prevented swap_on() of a dev/file > 2^31 bytes.
 1.9 01-May-1998  mrg fix a problem with swapping to files where a new variable introduced was not
later incremented correctedly, causing the wrong data to be paged out, which
then caused general lossage later when the data was paged in and the process
tried to use it. found by pk.
 1.8 09-Mar-1998  mrg KNF.
 1.7 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.6 19-Feb-1998  thorpej Include the NFS option header.
 1.5 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4 08-Feb-1998  mrg move pdhist initialisation to the same place as maphist. also, declare
the history buffers are "struct uvm_history_ent" to ensure proper
alignment (eg, alpha). this fixes a boottime panic when the pdhist was
used before it had been initialised.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.13.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.22.2.4 30-May-1999  chs in uvm_swap_io()'s async case, set the aio flags to indicate whether
or not the i/o is being started by the pagedaemon.
 1.22.2.3 09-Apr-1999  chs swapbuf aiodones now handled by aiodone daemon, not pagedaemon.
 1.22.2.2 25-Feb-1999  chs remove sw_sq from swapbuf, it's unused.
in uvm_swap_get(), use VM_PAGER_OK instead of 0.
thread_wakeup() -> wakeup().
 1.22.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.27.4.3 31-Jul-1999  chs in uvm_swap_io(), initialize some more buf fields that we now use.
 1.27.4.2 04-Jul-1999  chs remove swapbufs, plain ol' bufs are sufficient now.
remove uvm_swap_*iodone().
 1.27.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.27.2.2 08-Jan-2000  he Pull up revision 1.31 (requested by wrstuden):
Revert a previous change regarding spec_close handling. The
vnode being closed was already locked, so do not try to re-lock.
The result before this fix was that failed attempts at "swapon"
would panic the machine.
 1.27.2.1 18-Oct-1999  cgd pull up rev 1.29 from trunk (requested by wrstuden):
In spec_close(), call the device's close routine with the vnode
unlocked if the call might block. Force a non-blocking close if
VXLOCK is set. This eliminates a potential deadlock situation, and
should eliminate the dirty buffers on reboot issue.
 1.28.2.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.28.2.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.29.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.29.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.29.2.4 05-Jan-2001  bouyer Sync with HEAD
 1.29.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.29.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.29.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.37.4.3 30-Sep-2002  itojun sys/uvm/uvm_swap.c 1.68

Call cpu_dumpconf() after dumpdev change, so that the global
dumpsize/dumplo get updated.

(drochner)
 1.37.4.2 14-Feb-2002  he Pull up revision 1.43 (requested by chs):
Make memory allocation failures during ``swapctl -a'' return an error
instead of causing a panic.
 1.37.4.1 13-Nov-2000  tv Pullup 1.39 [chs]:
in swap_off(), reverse the order of vrele() and VOP_CLOSE() so that
devices will actually be notified if this is the last close.
this allows raidframe swap devices to be marked clean.
also, move the corresponding vref() into swap_on() for symmetry
and improve some comments.
 1.46.2.15 11-Nov-2002  nathanw Catch up to -current
 1.46.2.14 18-Oct-2002  nathanw Catch up to -current.
 1.46.2.13 17-Sep-2002  nathanw Catch up to -current.
 1.46.2.12 01-Aug-2002  nathanw Catch up to -current.
 1.46.2.11 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.46.2.10 20-Jun-2002  nathanw Catch up to -current.
 1.46.2.9 29-May-2002  nathanw #include <sys/sa.h> before <sys/syscallargs.h>, to provide sa_upcall_t
now that <sys/param.h> doesn't include <sys/sa.h>.

(Behold the Power of Ed)
 1.46.2.8 17-Apr-2002  nathanw Catch up to -current.
 1.46.2.7 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.46.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.46.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.46.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.46.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.46.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.46.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.52.2.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.52.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.52.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.52.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.52.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.52.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.53.2.3 01-Oct-2001  fvdl Catch up with -current.
 1.53.2.2 18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.53.2.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.54.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.60.2.1 11-Mar-2002  thorpej Convert swap_syscall_lock and uvm.swap_data_lock to adaptive mutexes,
and rename them apporpriately.
 1.64.4.1 02-Oct-2002  lukem Pull up revision 1.68 (requested by drochner in ticket #876):
call cpu_dumpconf() after dumpdev change, so that
the global dumpsize/dumplo get updated
 1.64.2.3 29-Aug-2002  gehenna catch up with -current.
 1.64.2.2 20-Jul-2002  gehenna catch up with -current.
 1.64.2.1 16-May-2002  gehenna Add bdevsw/cdevsw for swap device.
Replace with devsw APIs.
 1.80.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.80.2.6 02-Nov-2004  skrll Sync with HEAD.
 1.80.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.80.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.80.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.80.2.2 03-Aug-2004  skrll Sync with HEAD
 1.80.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.85.2.1 15-May-2004  tron Pull up revision 1.88 (requested by christos in ticket #338):
don't accept a negative number of swap devices; it will attempt to malloc
something very large and might crash the kernel; From Evgeny Demidov
 1.89.10.2 06-Apr-2006  tron Backout ticket #1241 because it requires the blist framework which
is not available in NetBSD 3.x.
 1.89.10.1 06-Apr-2006  tron Pull up following revision(s) (requested by jld in ticket #1241):
sys/uvm/uvm_swap.c: revision 1.90
switch swap space allocation code to use blist instead of extent(9).
fix "warning: resource shortage: %d pages of swap lost".
extent(9) has some undesirable characteristics for swap allocation:
- it involves alloc-to-free.
- its operational cost is O(n*n) where n is number of entries.
 1.89.4.1 29-Apr-2005  kent sync with -current
 1.94.2.9 17-Mar-2008  yamt sync with head.
 1.94.2.8 04-Feb-2008  yamt sync with head.
 1.94.2.7 21-Jan-2008  yamt sync with head
 1.94.2.6 07-Dec-2007  yamt sync with head
 1.94.2.5 27-Oct-2007  yamt sync with head.
 1.94.2.4 03-Sep-2007  yamt sync with head.
 1.94.2.3 26-Feb-2007  yamt sync with head.
 1.94.2.2 30-Dec-2006  yamt sync with head.
 1.94.2.1 21-Jun-2006  yamt sync with head.
 1.97.2.2 01-Feb-2006  yamt sync with head.
 1.97.2.1 15-Jan-2006  yamt sync with head.
 1.99.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.99.8.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.99.8.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.99.8.1 08-Mar-2006  elad Adapt to kernel authorization changes.
 1.99.6.5 14-Sep-2006  yamt sync with head.
 1.99.6.4 03-Sep-2006  yamt sync with head.
 1.99.6.3 11-Aug-2006  yamt sync with head
 1.99.6.2 26-Jun-2006  yamt sync with head.
 1.99.6.1 24-May-2006  yamt sync with head.
 1.99.4.1 01-Jun-2006  kardel Sync with head.
 1.99.2.1 09-Sep-2006  rpaulo sync with head
 1.100.2.1 19-Jun-2006  chap Sync with head.
 1.105.2.3 30-Jan-2007  ad Remove support for SA. Ok core@.
 1.105.2.2 12-Jan-2007  ad Sync with head.
 1.105.2.1 18-Nov-2006  ad Sync with head.
 1.106.2.2 10-Dec-2006  yamt sync with head.
 1.106.2.1 22-Oct-2006  yamt sync with head
 1.113.2.2 09-Dec-2006  bouyer Pull up following revision(s) (requested by elad in ticket #261):
sys/uvm/uvm_extern.h: revision 1.123
sys/uvm/uvm_swap.c: revision 1.115
share/man/man9/uvm.9: revision 1.79
Back out uvm_is_swap_device().
 1.113.2.1 02-Dec-2006  bouyer Pull up following revision(s) (requested by elad in ticket #241):
sys/uvm/uvm_swap.c: revision 1.114
We are required to hold uvm.swap_data_lock here too.
 1.117.2.4 07-May-2007  yamt sync with head.
 1.117.2.3 24-Mar-2007  yamt sync with head.
 1.117.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.117.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.122.2.15 25-Oct-2007  ad Fix swap to block devices.
 1.122.2.14 23-Oct-2007  ad Sync with head.
 1.122.2.13 28-Aug-2007  yamt fix swapping after buffer locking changes. (on regular files, at least)
 1.122.2.12 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.122.2.11 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.122.2.10 15-Jul-2007  ad Sync with head.
 1.122.2.9 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.122.2.8 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.122.2.7 08-Jun-2007  ad Sync with head.
 1.122.2.6 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.122.2.5 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.122.2.4 08-Apr-2007  ad Correct a comment.
 1.122.2.3 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.122.2.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.122.2.1 13-Mar-2007  ad Sync with head.
 1.123.2.1 11-Jul-2007  mjf Sync with head.
 1.126.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.129.10.2 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.129.10.1 29-Jul-2007  ad file uvm_swap.c was added on branch matt-mips64 on 2007-07-29 13:31:19 +0000
 1.129.8.1 18-Oct-2007  yamt sync with head.
 1.129.6.3 23-Mar-2008  matt sync with HEAD
 1.129.6.2 09-Jan-2008  matt sync with HEAD
 1.129.6.1 06-Nov-2007  matt sync with HEAD
 1.129.4.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.129.4.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.129.4.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.130.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.130.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.130.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.131.2.2 26-Dec-2007  ad Sync with head.
 1.131.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.132.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.136.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.136.6.4 28-Sep-2008  mjf Sync with HEAD.
 1.136.6.3 02-Jun-2008  mjf Sync with HEAD.
 1.136.6.2 03-Apr-2008  mjf Sync with HEAD.
 1.136.6.1 29-Mar-2008  mjf - etc/devfsd.conf: Add some rules to give nodes like /dev/tty and
/dev/null better default modes, i.e. 0666.

- sbin/init: Run devfsd -s before going to multiuser.

- sys/arch: Provide arm32, i386, sparc with a mem_init() function to request
device nodes for /dev/null, /dev/zero, etc.

- sys/dev: Convert rnd, wd, agp, raid, cd, sd, wsdisplay, wskbd, wsmouse,
wsmux, tty, bpf, swap to devfs New World Order.

- sys/fs/devfs: Make the visibility attribute of device nodes configurable.
Also provide a function to mount a devfs on boot.

- sys/kern: Add a new boot flag, -n. This disables devfs support. Unless
the -n flag is specified the kernel will mount a devfs file
system on boot.
 1.136.2.1 24-Mar-2008  keiichi sync with head.
 1.137.6.4 10-Oct-2008  skrll Sync with HEAD.
 1.137.6.3 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.137.6.2 14-May-2008  wrstuden Per discussion with ad, remove most of the #include <sys/sa.h> lines
as they were including sa.h just for the type(s) needed for syscallargs.h.

Instead, create a new file, sys/satypes.h, which contains just the
types needed for syscallargs.h. Yes, there's only one now, but that
may change and it's probably more likely to change if it'd be difficult
to handle. :-)

Per discussion with matt at n dot o, add an include of satypes.h to
sigtypes.h. Upcall handlers are kinda signal handlers, and signalling
is the header file that's already included for syscallargs.h that
closest matches SA.

This shaves about 3000 lines off of the diff of the branch relative
to the base. That also represents about 18% of the total before this
checkin.

I think this reduction is very good thing.
 1.137.6.1 10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.137.4.5 11-Aug-2010  yamt sync with head.
 1.137.4.4 11-Mar-2010  yamt sync with head
 1.137.4.3 16-Sep-2009  yamt sync with head
 1.137.4.2 04-May-2009  yamt sync with head.
 1.137.4.1 16-May-2008  yamt sync with head.
 1.137.2.2 04-Jun-2008  yamt sync with head
 1.137.2.1 18-May-2008  yamt sync with head.
 1.139.4.1 19-Oct-2008  haad Sync with HEAD.
 1.140.4.1 27-Dec-2008  snj branches: 1.140.4.1.4;
Pull up following revision(s) (requested by bouyer in ticket #211):
sys/uvm/uvm_swap.c: revision 1.141
PR kern/40027 pagedaemon loops on memory shortage
uvm_swapisfull: don't count some small portion as it may be inaccessible to
us at any given moment, for example if there is lock contention or if pages
are busy.
 1.140.4.1.4.1 04-Apr-2012  matt Move the uvm_scheduler_mutex and cv init to uvm_init since they are
independent of VMSWAP.
 1.140.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.140.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.144.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.149.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.149.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.150.2.4 31-May-2011  rmind sync with head
 1.150.2.3 05-Mar-2011  rmind sync with head
 1.150.2.2 03-Jul-2010  rmind sync with head
 1.150.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.153.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.155.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.157.6.1 18-Feb-2012  mrg merge to -current.
 1.157.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.157.2.2 16-Jan-2013  yamt sync with (a bit old) head
 1.157.2.1 17-Apr-2012  yamt sync with head
 1.161.6.4 03-Dec-2017  jdolecek update from HEAD
 1.161.6.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.161.6.2 23-Jun-2013  tls resync from head
 1.161.6.1 25-Feb-2013  tls resync with head
 1.161.2.2 27-Oct-2014  msaitoh Pull up following revision(s) (requested by riastradh in ticket #1133):
sys/uvm/uvm_swap.c: revision 1.163
Set bp->b_resid to bp->b_bcount on error in swstrategy as required.
 1.161.2.1 18-Mar-2014  msaitoh Pull up following revision(s) (requested by manu in ticket #1025):
sys/compat/netbsd32/netbsd32_netbsd.c: revision 1.184
sys/uvm/uvm_swap.c: revision 1.166
sys/uvm/uvm_swap.h: revision 1.20
sys/compat/netbsd32/netbsd32.h: revision 1.99
Properly translate struct swapent for COMPAT_NETBSD32
Properly translate struct swapent for COMPAT_NETBSD32 (missing commit)
 1.163.4.1 18-May-2014  rmind sync with head
 1.168.2.1 10-Aug-2014  tls Rebase.
 1.172.10.1 25-Dec-2018  martin Apply patch, requested by maxv in ticket #1666:
Fix similar to:

sys/uvm/uvm_swap.c: revision 1.178

Woah man, fix enormous leak.

Possible info leak: [len=1056, leaked=931]
#0 0xffffffff80bad351 in kleak_copyout
#1 0xffffffff80b2cf64 in uvm_swap_stats.part.1
#2 0xffffffff80b2d38d in uvm_swap_stats
#3 0xffffffff80b2d43c in sys_swapctl
#4 0xffffffff80259b82 in syscall
 1.172.6.1 25-Dec-2018  martin Apply patch, requested by maxv in ticket #1666:
Fix similar to:

sys/uvm/uvm_swap.c: revision 1.178

Woah man, fix enormous leak.

Possible info leak: [len=1056, leaked=931]
#0 0xffffffff80bad351 in kleak_copyout
#1 0xffffffff80b2cf64 in uvm_swap_stats.part.1
#2 0xffffffff80b2d38d in uvm_swap_stats
#3 0xffffffff80b2d43c in sys_swapctl
#4 0xffffffff80259b82 in syscall
 1.172.4.2 09-Jul-2016  skrll Sync with HEAD
 1.172.4.1 22-Sep-2015  skrll Sync with HEAD
 1.172.2.1 25-Dec-2018  martin Apply patch, requested by maxv in ticket #1666:
Fix similar to:

sys/uvm/uvm_swap.c: revision 1.178

Woah man, fix enormous leak.

Possible info leak: [len=1056, leaked=931]
#0 0xffffffff80bad351 in kleak_copyout
#1 0xffffffff80b2cf64 in uvm_swap_stats.part.1
#2 0xffffffff80b2d38d in uvm_swap_stats
#3 0xffffffff80b2d43c in sys_swapctl
#4 0xffffffff80259b82 in syscall
 1.174.10.2 25-Dec-2018  martin Apply patch, requested by maxv in ticket #1142:
Similar to:

sys/uvm/uvm_swap.c: revision 1.178

Fix kernel info leak in swapctl(2).

Possible info leak: [len=1056, leaked=931]
#0 0xffffffff80bad351 in kleak_copyout
#1 0xffffffff80b2cf64 in uvm_swap_stats.part.1
#2 0xffffffff80b2d38d in uvm_swap_stats
#3 0xffffffff80b2d43c in sys_swapctl
#4 0xffffffff80259b82 in syscall
 1.174.10.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.174.8.1 27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.174.2.1 20-Jul-2016  pgoyette Adapt machine-independant code to the new {b,c}devsw reference-counting
(using localcount(9)). All callers of {b,c}devsw_lookup() now call
{b,c}devsw_lookup_acquire() which retains a reference on the 'struct
{b,c}devsw'. This reference must be released by the caller once it is
finished with the structure's content (or other data that would disappear
if the 'struct {b,c}devsw' were to disappear).
 1.175.2.5 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.175.2.4 15-Mar-2018  pgoyette Resolve conflicts from sync-with-HEAD
 1.175.2.3 15-Mar-2018  pgoyette Synch with HEAD
 1.175.2.2 13-Mar-2018  pgoyette Properly detect 'compat handler for SWAP_STATSxx not present' and return
EINVAL as we would for any other unsupported command.
 1.175.2.1 13-Mar-2018  pgoyette Move the swapstats compat code into the compat_netbsd module.

Without this, a kernel configured without COMPAT_13 and/or COMPAT_50
could not execute the compat swapstats code, even if the compat_netbsd
module had been loaded.
 1.177.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.177.2.1 10-Jun-2019  christos Sync with HEAD
 1.185.2.1 29-Feb-2020  ad Sync with head.
 1.200.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.203.6.1 31-May-2021  cjep sync with head
 1.203.4.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.204.2.1 06-Jun-2021  cjep sync with head
 1.206.4.1 21-Dec-2022  martin Pull up following revision(s) (requested by chs in ticket #13):

sys/uvm/uvm_swap.c: revision 1.207

swap: disallow user opens of swap block device

the swap/drum block device was never intended to allow user opens,
but when the internal VOP_OPEN() in uvm_swap_init() was added
back in rev 1.135, the d_open method was changed from always-fail
to always-succeed in order to allow the new initial internal open.
this had the side effect of incorrectly allowing user opens too.
fix this by replacing the swap_bdevsw d_open with one that succeeds
for the first call but fails for all subsequent calls.
 1.208.6.1 02-Aug-2025  perseant Sync with HEAD
 1.29 15-Mar-2024  andvar "retval = 0" should be "*retval = 0", should fix the broken build.
 1.28 15-Mar-2024  andvar Rewrite !VMSWAP uvm_swap_stats() macro as a static function. NFCI.

From riastradh
 1.27 15-Mar-2024  andvar Fix !VMSWAP build:
Added __unused for few local variables, which are used in VMSWAP block only.
Adjust !VMSWAP uvm_swap_stats() definition to make it build with compat code.
Copied "int (*uvm_swap_stats50)(...)" definition from uvm_swap to uvm_swapstub
to avoid missing uvm_swap_stats50 reference on linking.

Fixes INSTALL_CPMBR1400, INSTALL_ZYXELKX evbmips kernel configs as a result.

Reviewed by simon and phone in IRC (thanks).
 1.26 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.25 01-May-2019  mlelstv allow NONE build
 1.24 15-Mar-2018  christos branches: 1.24.2;
finish moving the compat code out.
 1.23 15-Mar-2018  christos Untangle the swapctl compat code mess. Welcome to lucky 13.
 1.22 30-Jul-2015  christos branches: 1.22.16;
include decls for _MODULE
 1.21 30-Jul-2015  maxv Lock before calling uvm_swap_stats(). Otherwise a race condition could
corrupt memory.
 1.20 03-Feb-2014  manu branches: 1.20.6;
Properly translate struct swapent for COMPAT_NETBSD32
 1.19 23-Nov-2013  christos convert from CIRCLEQ to TAILQ
add uvm_swap_shutdown(), unused
 1.18 27-Apr-2011  rmind branches: 1.18.4; 1.18.10; 1.18.14; 1.18.18;
Remove public uvm_swap_stats() routine, keep it internal.
 1.17 29-May-2008  mrg branches: 1.17.20; 1.17.26;
remove clause #3 from my license where there are no other
copyright holders involved.
 1.16 22-Feb-2007  thorpej branches: 1.16.38; 1.16.40; 1.16.42; 1.16.44;
TRUE -> true, FALSE -> false
 1.15 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.14 11-Dec-2005  christos branches: 1.14.26;
merge ktrace-lwp.
 1.13 17-Sep-2005  yamt - make uvm_swap_stats acquire swap_syscall_lock by itsself
so that callers don't need to acquire it beforehand.
- make swap_syscall_lock static.
 1.12 17-Sep-2005  yamt make VMSWAP optional again.
 1.11 13-Sep-2005  yamt wrap swap related code by #ifdef VMSWAP. always #define VMSWAP for now.
 1.10 31-Jul-2005  yamt revert "defflag VMSWAP" changes for now.
there seems to be far more people who don't want to edit
their kernel config files than i thought.
 1.9 30-Jul-2005  yamt defflag VMSWAP.
 1.8 11-Aug-2003  pk branches: 1.8.16;
Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.
 1.7 21-Jul-2003  mrg de-__P()ify.
 1.6 18-Mar-2002  manu branches: 1.6.12;
Move swapctl(SWAP_STATS) implementation to a separate function called
uvm_swap_stats(). This is done in order to allow COMPAT_* swapctl()
emulation to use it directly without going through sys_swapctl().

The problem with using sys_swapctl() there is that it involves
copying the swapent array to the stackgap, and this array's size
is not known at build time. Hence it would not be possible to
ensure it would fit in the stackgap in any case.
 1.5 11-Jan-2000  chs branches: 1.5.6; 1.5.8;
add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.4 21-Jun-1999  thorpej branches: 1.4.2;
Protect prototypes, certain macros, and inlines from userland.
 1.3 07-Feb-1998  mrg branches: 1.3.10;
restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.3.10.1 01-Jul-1999  thorpej Sync w/ -current.
 1.4.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.5.8.1 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.5.6.1 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.6.12.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.6.12.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.6.12.2 18-Sep-2004  skrll Sync with HEAD.
 1.6.12.1 03-Aug-2004  skrll Sync with HEAD
 1.8.16.2 26-Feb-2007  yamt sync with head.
 1.8.16.1 21-Jun-2006  yamt sync with head.
 1.14.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.16.44.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.16.42.1 04-May-2009  yamt sync with head.
 1.16.40.1 04-Jun-2008  yamt sync with head
 1.16.38.1 02-Jun-2008  mjf Sync with HEAD.
 1.17.26.1 06-Jun-2011  jruoho Sync with HEAD.
 1.17.20.1 31-May-2011  rmind sync with head
 1.18.18.1 18-May-2014  rmind sync with head
 1.18.14.2 03-Dec-2017  jdolecek update from HEAD
 1.18.14.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.18.10.1 18-Mar-2014  msaitoh Pull up following revision(s) (requested by manu in ticket #1025):
sys/compat/netbsd32/netbsd32_netbsd.c: revision 1.184
sys/uvm/uvm_swap.c: revision 1.166
sys/uvm/uvm_swap.h: revision 1.20
sys/compat/netbsd32/netbsd32.h: revision 1.99
Properly translate struct swapent for COMPAT_NETBSD32
Properly translate struct swapent for COMPAT_NETBSD32 (missing commit)
 1.18.4.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.20.6.1 22-Sep-2015  skrll Sync with HEAD
 1.22.16.3 15-Mar-2018  pgoyette Resolve conflicts from sync-with-HEAD
 1.22.16.2 15-Mar-2018  pgoyette Synch with HEAD
 1.22.16.1 13-Mar-2018  pgoyette Move the swapstats compat code into the compat_netbsd module.

Without this, a kernel configured without COMPAT_13 and/or COMPAT_50
could not execute the compat swapstats code, even if the compat_netbsd
module had been loaded.
 1.24.2.1 10-Jun-2019  christos Sync with HEAD
 1.9 15-Mar-2024  andvar Fix !VMSWAP build:
Added __unused for few local variables, which are used in VMSWAP block only.
Adjust !VMSWAP uvm_swap_stats() definition to make it build with compat code.
Copied "int (*uvm_swap_stats50)(...)" definition from uvm_swap to uvm_swapstub
to avoid missing uvm_swap_stats50 reference on linking.

Fixes INSTALL_CPMBR1400, INSTALL_ZYXELKX evbmips kernel configs as a result.

Reviewed by simon and phone in IRC (thanks).
 1.8 18-Feb-2014  pooka Use same uvm_swap_shutdown() stub for !vmswap kernels and rump kernels.
 1.7 27-Apr-2011  rmind branches: 1.7.4; 1.7.14; 1.7.18;
Remove public uvm_swap_stats() routine, keep it internal.
 1.6 08-Jan-2008  matt branches: 1.6.12; 1.6.32; 1.6.38;
Make sys_swapctl match syscallargs.h
 1.5 23-Feb-2007  skrll branches: 1.5.18; 1.5.24; 1.5.30;
-#include <sys/sa.h>
 1.4 11-Dec-2005  christos branches: 1.4.18; 1.4.28;
merge ktrace-lwp.
 1.3 21-Sep-2005  yamt branches: 1.3.6;
add a file which i forgot when reviving VMSWAP option.
 1.2 31-Jul-2005  yamt revert "defflag VMSWAP" changes for now.
there seems to be far more people who don't want to edit
their kernel config files than i thought.
 1.1 30-Jul-2005  yamt defflag VMSWAP.
 1.3.6.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.6.1 21-Sep-2005  skrll file uvm_swapstub.c was added on branch ktrace-lwp on 2005-11-10 14:12:40 +0000
 1.4.28.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.4.18.4 21-Jan-2008  yamt sync with head
 1.4.18.3 26-Feb-2007  yamt sync with head.
 1.4.18.2 21-Jun-2006  yamt sync with head.
 1.4.18.1 11-Dec-2005  yamt file uvm_swapstub.c was added on branch yamt-lazymbuf on 2006-06-21 15:12:40 +0000
 1.5.30.1 08-Jan-2008  bouyer Sync with HEAD
 1.5.24.1 18-Feb-2008  mjf Sync with HEAD.
 1.5.18.1 09-Jan-2008  matt sync with HEAD
 1.6.38.1 06-Jun-2011  jruoho Sync with HEAD.
 1.6.32.1 31-May-2011  rmind sync with head
 1.6.12.2 14-May-2008  wrstuden Per discussion with ad, remove most of the #include <sys/sa.h> lines
as they were including sa.h just for the type(s) needed for syscallargs.h.

Instead, create a new file, sys/satypes.h, which contains just the
types needed for syscallargs.h. Yes, there's only one now, but that
may change and it's probably more likely to change if it'd be difficult
to handle. :-)

Per discussion with matt at n dot o, add an include of satypes.h to
sigtypes.h. Upcall handlers are kinda signal handlers, and signalling
is the header file that's already included for syscallargs.h that
closest matches SA.

This shaves about 3000 lines off of the diff of the branch relative
to the base. That also represents about 18% of the total before this
checkin.

I think this reduction is very good thing.
 1.6.12.1 10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.7.18.1 18-May-2014  rmind sync with head
 1.7.14.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.4.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.51 10-Jan-2022  christos Use p->p_stackbase instead of USRSTACK because the stackbase can move because
of ASLR.
 1.50 06-Jan-2018  kamil Revert vadvise(2) removal

This system call was used in legacy Lisp code, that was inherited to modern
age and still compiled against supported compat layers (e.g. in clisp,
oaklisp, Franz Lisp).

It used to instruct the kernel about paging policy (G/C aware, flush etc).

Newly compiled code (assuming that it will detect vadvise()) will use the
libc stub for vadvise(). The headers for this interface are gone.

vadvise(2) could be marked as COMPAT_80, but as long as we support ultrix,
sunos or aout68k ABI, don't bother with this.

Requested by <mrg>
 1.49 19-Dec-2017  kamil Drop SYS_vadvise

The (o)vadvise syscall is dummy since the beginning of NetBSD.

It is an obsolete remnant from the old UNIX.

Sponsored by <The NetBSD Foundation>
 1.48 06-May-2017  joerg Extend the mmap(2) interface to allow requesting protections for later
use with mprotect(2), but without enabling them immediately.

Extend the mremap(2) interface to allow duplicating mappings, i.e.
create a second range of virtual addresses references the same physical
pages. Duplicated mappings can have different effective protections.

Adjust PAX mprotect logic to disallow effective protections of W&X, but
allow one mapping W and another X protections. This obsoletes using
temporary files for purposes like JIT.

Adjust PAX logic for mmap(2) and mprotect(2) to fail if W&X is requested
and not silently drop the X protection.

Improve test cases to ensure correct operation of the changed
interfaces.
 1.47 07-Apr-2016  christos branches: 1.47.8;
remove more ifdefs
 1.46 07-Apr-2016  christos Add PAX_MPROTECT_DEBUG
 1.45 05-Sep-2014  matt branches: 1.45.2;
Don't use C++ new keyword as a variable name.
 1.44 02-Feb-2011  chuck branches: 1.44.14;
udpate license clauses on my code to match the new-style BSD licenses.
verified with Mike Hibler it is ok to remove clause 3 on utah copyright,
as per UCB.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.43 15-Dec-2009  matt branches: 1.43.4; 1.43.6; 1.43.8;
Use PRIxVADDR... (change a printf/panic -> panic)
 1.42 27-Nov-2009  njoly Make break(2) reject high adresses that wrap to 0 after page rounding.
 1.41 04-Mar-2009  christos don't uprintf in non-debug kernels.
 1.40 02-Jan-2008  ad branches: 1.40.10; 1.40.12; 1.40.18; 1.40.24; 1.40.28;
Merge vmlocking2 to head.
 1.39 20-Dec-2007  dsl Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
 1.38 09-Feb-2007  ad branches: 1.38.20; 1.38.26; 1.38.28; 1.38.32;
Merge newlock2 to head.
 1.37 18-Dec-2006  skrll Update uvm_grow to support stacks that grow upwards.

Use on hppa and fix a bug in the hppa trap handler.
 1.36 01-Nov-2006  yamt branches: 1.36.2;
remove some __unused from function parameters.
 1.35 12-Oct-2006  yamt remove unnecessary #include of vnode.h.
 1.34 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.33 20-May-2006  elad branches: 1.33.6; 1.33.8;
Better implementation of PaX MPROTECT, after looking some more into the
code and not trying to use temporary solutions.

Lots of comments and help from YAMAMOTO Takashi, also thanks to the PaX
author for being quick to recognize that something fishy's going on. :)

Hook up in mmap/vmcmd rather than (ugh!) uvm_map_protect().

Next time I suggest to commit a temporary solution just revoke my
commit bit.
 1.32 11-Dec-2005  christos branches: 1.32.4; 1.32.6; 1.32.8; 1.32.12; 1.32.14;
merge ktrace-lwp.
 1.31 27-Jun-2005  thorpej branches: 1.31.2;
Use ANSI function decls.
 1.30 28-Aug-2004  jdolecek uvm_grow(): avoid needless arithmetic and make LP64 safe
 1.29 24-Aug-2003  chs add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.
 1.28 25-May-2003  simonb branches: 1.28.2;
Consistancy nit- use parentheses around return argument.
 1.27 18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.26 08-Dec-2001  thorpej Make the coredump routine exec-format/emulation specific. Split
out traditional NetBSD coredump routines into core_netbsd.c and
netbsd32_core.c (for COMPAT_NETBSD32).
 1.25 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.24 06-Jun-2001  mrg branches: 1.24.2; 1.24.6;
uvm_coredump32() moved into compat/netbsd32.
 1.23 02-Jun-2001  chs replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.22 25-May-2001  chs remove trailing whitespace.
 1.21 06-May-2001  ross Fix overflow errors in brk(2).
 1.20 19-Mar-2001  simonb In sys_obreak(), the return value of atop() was being used to change
the process dsize for both positive and negative changes. Since atop()
casts its result to a paddr_t (which is unsigned), negative changes in
process data size resulted in unrealistic dsizes being set. Use
"dsize -= atop(-diff)" for a negative diffs. Fixes the "Impossible
process sizes" mentioned on current-users.

Unsigned cast catch and much debugging help from Martin Laubach.
 1.19 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.18 13-Sep-2000  thorpej branches: 1.18.2;
Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.
 1.17 07-Sep-2000  chs fix uvm_coredump32() just like uvm_coredump().
 1.16 24-Aug-2000  chs in uvm_coredump(), avoid dumping parts of the stack multiple times
while skipping parts of the stack that hasn't been used.
pointed out by SAITOH Masanobu <masanobu@iij.ad.jp>.
 1.15 10-Jul-2000  mrg fix a cast for sparc64.
 1.14 02-Jul-2000  thorpej - Avoid an integer overflow when checking if we have exceeded our
rlimit in sbrk. Slightly modified from a patch from Artur Grabowski.
- Rearrange code slightly, partially from Artur Grabowski.
- Only adjust vm_dsize if the grow or shrink actually succeeds.
 1.13 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.12 30-Mar-2000  augustss branches: 1.12.4;
Remove more register declarations.
 1.11 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.10 30-Dec-1999  eeh I should have made uvm_page_physload() take paddr_t's instead of vaddr_t's.
Also, add uvm_coredump32().
 1.9 04-Dec-1999  fvdl CL* clearout
 1.8 25-Mar-1999  mrg branches: 1.8.2; 1.8.4; 1.8.8; 1.8.14;
remove now >1 year old pre-release message.
 1.7 11-Oct-1998  chuck remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks
 1.6 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.5 28-Jul-1998  thorpej branches: 1.5.2;
Don't cast the null residual pointer passed to vn_rdwr().
 1.4 09-Mar-1998  mrg KNF.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.5.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.8.14.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.8.8.2 27-Mar-2001  bouyer Sync with HEAD.
 1.8.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.4.1 11-Aug-1999  chs add casts for trunc_page() and round_page() args.
 1.8.2.1 09-Sep-2000  he Pull up revision 1.16 (via patch, requested by chs):
In uvm_coredump(), avoid dumping parts of the stack multiple
times while skipping parts of the stack that hasn't been used.
 1.12.4.2 07-Sep-2000  chs pull up revs 1.16 and 1.17, approved by thorpej:
> in uvm_coredump*(), avoid dumping parts of the stack multiple times
> while skipping parts of the stack that haven't been used.
> pointed out by SAITOH Masanobu <masanobu@iij.ad.jp>.
 1.12.4.1 02-Jul-2000  thorpej Pull up rev. 1.14:
- Avoid an integer overflow when checking if we have exceeded our
rlimit in sbrk. Slightly modified from a patch from Artur Grabowski.
- Rearrange code slightly, partially from Artur Grabowski.
- Only adjust vm_dsize if the grow or shrink actually succeeds.
 1.18.2.7 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.18.2.6 29-May-2002  nathanw #include <sys/sa.h> before <sys/syscallargs.h>, to provide sa_upcall_t
now that <sys/param.h> doesn't include <sys/sa.h>.

(Behold the Power of Ed)
 1.18.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.18.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.18.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.18.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.18.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.24.6.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.24.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.28.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.28.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.28.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.28.2.2 03-Sep-2004  skrll Sync with HEAD
 1.28.2.1 03-Aug-2004  skrll Sync with HEAD
 1.31.2.4 21-Jan-2008  yamt sync with head
 1.31.2.3 26-Feb-2007  yamt sync with head.
 1.31.2.2 30-Dec-2006  yamt sync with head.
 1.31.2.1 21-Jun-2006  yamt sync with head.
 1.32.14.1 19-Jun-2006  chap Sync with head.
 1.32.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.32.8.1 24-May-2006  yamt sync with head.
 1.32.6.1 01-Jun-2006  kardel Sync with head.
 1.32.4.1 09-Sep-2006  rpaulo sync with head
 1.33.8.3 18-Dec-2006  yamt sync with head.
 1.33.8.2 10-Dec-2006  yamt sync with head.
 1.33.8.1 22-Oct-2006  yamt sync with head
 1.33.6.3 30-Jan-2007  ad Remove support for SA. Ok core@.
 1.33.6.2 12-Jan-2007  ad Sync with head.
 1.33.6.1 18-Nov-2006  ad Sync with head.
 1.36.2.1 04-Jan-2007  bouyer Pull up following revision(s) (requested by skrll in ticket #323):
sys/uvm/uvm_unix.c: revision 1.37
sys/arch/hppa/hppa/trap.c: revision 1.39
Update uvm_grow to support stacks that grow upwards.
Use on hppa and fix a bug in the hppa trap handler.
 1.38.32.1 02-Jan-2008  bouyer Sync with HEAD
 1.38.28.2 26-Dec-2007  ad - Push kernel_lock back into exit, wait and sysctl system calls, mainly
for visibility.
- Serialize calls to brk() from within the same process.
- Mark more syscalls MPSAFE.
 1.38.28.1 26-Dec-2007  ad Sync with head.
 1.38.26.1 18-Feb-2008  mjf Sync with HEAD.
 1.38.20.1 09-Jan-2008  matt sync with HEAD
 1.40.28.2 29-Apr-2011  matt Fix PRIdVSIZE macro
 1.40.28.1 23-Aug-2009  matt PRIxVADDR, PRIdVSIZE, PRIxVSIZE, or PRIxPADDR as appropriate.
Use __intXX_t or __uintXX_t as appropriate in <mips/types.h>
 1.40.24.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.40.18.1 28-Apr-2009  skrll Sync with HEAD.
 1.40.12.2 14-May-2008  wrstuden Per discussion with ad, remove most of the #include <sys/sa.h> lines
as they were including sa.h just for the type(s) needed for syscallargs.h.

Instead, create a new file, sys/satypes.h, which contains just the
types needed for syscallargs.h. Yes, there's only one now, but that
may change and it's probably more likely to change if it'd be difficult
to handle. :-)

Per discussion with matt at n dot o, add an include of satypes.h to
sigtypes.h. Upcall handlers are kinda signal handlers, and signalling
is the header file that's already included for syscallargs.h that
closest matches SA.

This shaves about 3000 lines off of the diff of the branch relative
to the base. That also represents about 18% of the total before this
checkin.

I think this reduction is very good thing.
 1.40.12.1 10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.40.10.2 11-Mar-2010  yamt sync with head
 1.40.10.1 04-May-2009  yamt sync with head.
 1.43.8.1 08-Feb-2011  bouyer Sync with HEAD
 1.43.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.43.4.1 05-Mar-2011  rmind sync with head
 1.44.14.1 03-Dec-2017  jdolecek update from HEAD
 1.45.2.2 28-Aug-2017  skrll Sync with HEAD
 1.45.2.1 22-Apr-2016  skrll Sync with HEAD
 1.47.8.1 11-May-2017  pgoyette Sync with HEAD
 1.14 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.13 11-Dec-2005  christos branches: 1.13.98; 1.13.104; 1.13.106;
merge ktrace-lwp.
 1.12 27-Jun-2005  thorpej Use ANSI function decls.
 1.11 10-Nov-2001  lukem branches: 1.11.16;
add RCSIDs, and in some cases, slightly cleanup #include order
 1.10 02-Jun-2001  chs branches: 1.10.2; 1.10.6;
replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.9 15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.8 27-Jun-2000  mrg branches: 1.8.2;
remove include of <vm/vm.h>
 1.7 25-Mar-1999  mrg branches: 1.7.8;
remove now >1 year old pre-release message.
 1.6 11-Oct-1998  chuck remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks
 1.5 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.4 09-Mar-1998  mrg branches: 1.4.2;
KNF.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.4.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.7.8.2 27-Mar-2001  bouyer Sync with HEAD.
 1.7.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.8.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.8.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.10.6.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.10.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.11.16.1 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.13.106.1 08-Feb-2011  bouyer Sync with HEAD
 1.13.104.1 06-Jun-2011  jruoho Sync with HEAD.
 1.13.98.1 05-Mar-2011  rmind sync with head
 1.121 05-Apr-2024  riastradh uvm: Expand v_size <= v_writesize assertions to help diagnostics.

PR kern/58117
 1.120 09-Apr-2023  riastradh uvm: Simplify assertion in uvn_get.

No functional change intended.
 1.119 09-Apr-2023  riastradh uvm(9): KASSERT(A && B) -> KASSERT(A); KASSERT(B)
 1.118 13-Mar-2021  skrll Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats
 1.117 16-Aug-2020  chs branches: 1.117.2;
in uvm_findpage(), when uvm_page_array_fill_and_peek() returns a page
that is not the one we want and we make an assertion about dirtiness,
check the dirty status of the page we wanted rather than the page we got.
 1.116 14-Aug-2020  chs centralize calls from UVM to radixtree into a few functions.
in those functions, assert that the object lock is held in
the correct mode.
 1.115 09-Jul-2020  skrll Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
 1.114 25-May-2020  ad - Alter the convention for uvm_page_array slightly, so the basic search
parameters can't change part way through a search: move the "uobj" and
"flags" arguments over to uvm_page_array_init() and store those with the
array.

- With that, detect when it's not possible to find any more pages in the
tree with the given search parameters, and avoid repeated tree lookups if
the caller loops over uvm_page_array_fill_and_peek().
 1.113 19-May-2020  ad PR kern/32166: pgo_get protocol is ambiguous
Also problems with tmpfs+nfs noted by hannken@.

Don't pass PGO_ALLPAGES to pgo_get, and ignore PGO_DONTCARE in the
!PGO_LOCKED case. In uao_get() have uvm_pagealloc() take care of page
zeroing and release busy pages on error.
 1.112 19-May-2020  ad Don't try to do readahead on tmpfs.
 1.111 22-Mar-2020  ad Process concurrent page faults on individual uvm_objects / vm_amaps in
parallel, where the relevant pages are already in-core. Proposed on
tech-kern.

Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until
adjustments are made to their pmaps.
 1.110 14-Mar-2020  ad Make uvm_pagemarkdirty() responsible for putting vnodes onto the syncer
work list. Proposed on tech-kern@.
 1.109 14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.108 03-Mar-2020  rjs Make some wait channel names unique to six characters.
 1.107 27-Feb-2020  ad Tighten up the locking around vp->v_iflag a little more after the recent
split of vmobjlock & v_interlock.
 1.106 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.105 15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.104 21-Dec-2019  ad branches: 1.104.2;
- Rename VM_PGCOLOR_BUCKET() to VM_PGCOLOR(). I want to reuse "bucket" for
something else soon and TBH it matches what this macro does better.

- Add inlines to set/get locator values in the unused lower bits of
pg->phys_addr. Begin by using it to cache the freelist index, because
computing it is expensive and that shows up during profiling. Discussed
on tech-kern.
 1.103 28-Oct-2017  pgoyette branches: 1.103.4;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.102 06-Dec-2015  wiz branches: 1.102.10;
Fix typo in comment.
 1.101 06-Dec-2015  mlelstv Clean up assertions and catch integer overflow.
 1.100 24-Aug-2015  pooka to garnish, dust with _KERNEL_OPT
 1.99 30-Jul-2012  matt branches: 1.99.2; 1.99.16;
-fno-common broke kernhist since it used commons.
Add a KERNHIST_DEFINE which is define the kernel history.
Change UVM to deal with the new usage.
 1.98 01-Jun-2012  martin Only use generic readahead on VREG vnodes, the space used to store the
context is not valid on other types.
Prevents the crash reported in PR kern/38889, but does not fix the
mmap of block devices, more work is needed (no size on VBLK vnodes).
 1.97 06-Sep-2011  matt branches: 1.97.2; 1.97.6; 1.97.8;
Allocate color appropriate pages.
 1.96 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.95 23-Apr-2011  rmind branches: 1.95.2;
Replace "malloc" in comments, remove unnecessary header inclusions.
 1.94 02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
verified with Mike Hibler it is ok to remove clause 3 on utah copyright,
as per UCB.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.93 08-Jan-2010  pooka branches: 1.93.2; 1.93.4; 1.93.6; 1.93.8;
The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.92 04-Aug-2009  pooka uvm_vnp_zerorange() logically and by implementation more a part of
ubc than uvm_vnode, so move it over.
 1.91 04-Aug-2009  pooka kernel opt polish: g/c unnecessary fs_nfs.h and opt_ddb.h
 1.90 02-Jan-2008  ad branches: 1.90.10; 1.90.28;
Merge vmlocking2 to head.
 1.89 01-Dec-2007  yamt branches: 1.89.2; 1.89.6;
constify pagerops.
 1.88 01-Dec-2007  yamt use designated initiaizers for uvm_pagerops.
 1.87 11-Oct-2007  ad branches: 1.87.4;
Remove LOCK_ASSERT(!simple_lock_held(&foo));
 1.86 10-Oct-2007  ad Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.85 04-Aug-2007  pooka branches: 1.85.2; 1.85.4; 1.85.6;
Use VSIZENOTSET only in KASSERTs
 1.84 22-Jul-2007  pooka branches: 1.84.4;
Retire uvn_attach() - it abuses VXLOCK and its functionality,
setting vnode sizes, is handled elsewhere: file system vnode creation
or spec_open() for regular files or block special files, respectively.

Add a call to VOP_MMAP() to the pagedvn exec path, since the vnode
is being memory mapped.

reviewed by tech-kern & wrstuden
 1.83 09-Jul-2007  ad branches: 1.83.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.82 05-Jun-2007  yamt improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.81 04-Mar-2007  christos branches: 1.81.2; 1.81.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.80 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.79 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.78 09-Dec-2006  chs branches: 1.78.2;
a smorgasbord of improvements to vnode locking and path lookup:
- LOCKPARENT is no longer relevant for lookup(), relookup() or VOP_LOOKUP().
these now always return the parent vnode locked. namei() works as before.
lookup() and various other paths no longer acquire vnode locks in the
wrong order via vrele(). fixes PR 32535.
as a nice side effect, path lookup is also up to 25% faster.
- the above allows us to get rid of PDIRUNLOCK.
- also get rid of WANTPARENT (just use LOCKPARENT and unlock it).
- remove an assumption in layer_node_find() that all file systems implement
a recursive VOP_LOCK() (unionfs doesn't).
- require that all file systems supply vfs_vptofh and vfs_fhtovp routines.
fill in eopnotsupp() for file systems that don't support being exported
and remove the checks for NULL. (layerfs calls these without checking.)
- in union_lookup1(), don't change refcounts in the ISDOTDOT case, just
adjust which vnode is locked. fixes PR 33374.
- apply fixes for ufs_rename() from ufs_vnops.c rev. 1.61 to ext2fs_rename().
 1.77 01-Nov-2006  yamt branches: 1.77.2;
remove some __unused from function parameters.
 1.76 14-Oct-2006  yamt uvm_vnp_setsize: put back v_size assignment after uvn_put.
PR/34147 from Juergen Hannken-Illjes.
 1.75 12-Oct-2006  yamt move some knowledge about vnode into uvm_vnode.c.
 1.74 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.73 15-Sep-2006  yamt branches: 1.73.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.72 22-Jul-2006  yamt branches: 1.72.4;
- in genfs_getpages, take g_glock earlier so that it can't be
intervened by truncation.
it also fixes a deadlock. (g_glock vs pages locking order)
- uvm_vnp_setsize: modify v_size while holding v_interlock.

reviewed by Chuck Silvers.
 1.71 21-Jul-2006  ad - Use the LWP cached credentials where sane.
- Minor cosmetic changes.
 1.70 14-May-2006  elad integrate kauth.
 1.69 11-Dec-2005  christos branches: 1.69.4; 1.69.6; 1.69.8; 1.69.10; 1.69.12;
merge ktrace-lwp.
 1.68 29-Nov-2005  yamt merge yamt-readahead branch.
 1.67 29-Nov-2005  yamt read-ahead statistics.
 1.66 27-Jun-2005  thorpej branches: 1.66.2; 1.66.8;
Sprinkle some static.
 1.65 27-Jun-2005  thorpej Use ANSI function decls.
 1.64 09-Jan-2005  chs adjust the UBC mapping code to support non-vnode uvm_objects.
this means we can no longer look at the vnode size to determine how many
pages to request in a fault, which is good since for NFS the size can change
out from under us on the server anyway. there's also a new flag UBC_UNMAP
for ubc_release(), so that the file system code can make the decision about
whether to cache mappings for files being used as executables.
 1.63 24-Mar-2004  junyoung Nuke __P().
 1.62 29-Jun-2003  fvdl branches: 1.62.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.61 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.60 22-Apr-2003  yamt correct accounting of {exec,file}pages.
they are not updated correctly when breaking loan.
 1.59 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.58 17-May-2002  enami Make uvn_findpages to return number of pages found so that caller can
easily check if all requested pages are found or not.
 1.57 31-Dec-2001  chs branches: 1.57.8;
in uvm_vnp_setsize(), wait for any i/o in progress on pages that we free.
 1.56 09-Dec-2001  chs replace "vnode" and "vtext" with "file" and "exec" in uvmexp field names.
 1.55 10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.54 26-Sep-2001  chs branches: 1.54.2;
change the names of the arguments to uvn_put() to match their usage.
 1.53 22-Sep-2001  sommerfeld VOP_PUTPAGES must release the uobj's lock for us, so ensure it's locked
beforehand and unlocked afterwards using LOCK_ASSERT().
 1.52 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.51 17-Aug-2001  chs branches: 1.51.2;
allow mappings of VBLK vnodes.
 1.50 26-May-2001  chs branches: 1.50.2;
replace vm_page_t with struct vm_page *.
 1.49 25-May-2001  chs remove trailing whitespace.
 1.48 10-Mar-2001  chs eliminate the VM_PAGER_* error codes in favor of the traditional E* codes.
the mapping is:

VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART

for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
 1.47 09-Mar-2001  chs add UBC memory-usage balancing. we track the number of pages in use for
each of the basic types (anonymous data, executable image, cached files)
and prevent the pagedaemon from reusing a given page if that would reduce
the count of that type of page below a sysctl-setable minimum threshold.
the thresholds are controlled via three new sysctl tunables:
vm.anonmin, vm.vnodemin, and vm.vtextmin. these tunables are the
percentages of pageable memory reserved for each usage, and we do not allow
the sum of the minimums to be more than 95% so that there's always some
memory that can be reused.
 1.46 22-Feb-2001  enami branches: 1.46.2;
When shrinking file size, don't dispose of a page still in use.
 1.45 18-Feb-2001  chs in uvn_flush(), add a fast path for the case where the vnode has no pages.
update the comment above this function while I'm here.
 1.44 08-Feb-2001  chs remove a debug printf() that has outlived its usefulness.
 1.43 06-Feb-2001  chs in uvn_flush(), interpret a "stop" value of 0 as meaning all pages at
offsets equal to or higher than "start". use this in uvm_vnp_setsize()
instead of the vnode's size since there can be pages past EOF.
 1.42 28-Jan-2001  thorpej Page scanner improvements, behavior is actually a bit more like
Mach VM's now. Specific changes:
- Pages now need not have all of their mappings removed before being
put on the inactive list. They only need to have the "referenced"
attribute cleared. This makes putting pages onto the inactive list
much more efficient. In order to eliminate redundant clearings of
"refrenced", callers of uvm_pagedeactivate() must now do this
themselves.
- When checking the "modified" attribute for a page (for clearing
PG_CLEAN), make sure to only do it if PG_CLEAN is currently set on
the page (saves a potentially expensive pmap operation).
- When scanning the inactive list, if a page is referenced, reactivate
it (this part was actually added in uvm_pdaemon.c,v 1.27). This
now works properly now that pages on the inactive list are allowed to
have mappings.
- When scanning the inactive list and considering a page for freeing,
remove all mappings, and then check the "modified" attribute if the
page is marked PG_CLEAN.
- When scanning the active list, if the page was referenced since its
last sweep by the scanner, don't deactivate it. (This part was
actually added in uvm_pdaemon.c,v 1.28.)

These changes greatly improve interactive performance during
moderate to high memory and I/O load.
 1.41 08-Jan-2001  chs in uvn_flush(), when PGO_SYNCIO is specified then we should wait for
pending i/os to complete before returning even if PGO_CLEANIT is not
specified. this fixes two races:

(1) NFS write rpcs vs. setattr operations which truncate the file.
if the truncate doesn't wait for pending writes to complete then
a later write rpc completion can undo the effect of the truncate.
this problem has been reported by several people.

(2) write i/os in disk-based filesystem vs. the disk block being
freed by a truncation, allocated to a new file, and written
again with different data. if the disk driver reorders the requests
and does the second i/o first, the old data will clobber the new,
corrupting the new file. I haven't heard of anyone experiencing
this problem yet, but it's fixed now anyway.
 1.40 16-Dec-2000  chs in uvn_flush(), don't deactivate busy pages.
 1.39 06-Dec-2000  chs in uvn_findpage(), only increment the counter of vnode pages
if we succeed in allocating a page.

from Lars Heidieker <lars@heidieker.de> in PR 11636.
 1.38 30-Nov-2000  simonb Move uvm_pgcnt_vnode and uvm_pgcnt_anon into uvmexp (as vnodepages and
anonpages), and add vtextpages which is currently unused but will be
used to trace the number of pages used by vtext vnodes.
 1.37 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.36 24-Nov-2000  chs g/c unused pager ops "asyncget" and "aiodone".
 1.35 27-Jun-2000  mrg remove include of <vm/vm.h>
 1.34 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.33 19-May-2000  thorpej branches: 1.33.4;
Tell uvm_pagermapin() the direction of the I/O so that it can map
with only the protection that it needs.
 1.32 03-Apr-2000  chs remove the "shareprot" pagerop. it's not needed anymore since
share maps are long gone.
 1.31 27-Mar-2000  kleink Kill duplicate uvn_attach() prototype (public, already in uvm_vnode.h).
 1.30 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.29 13-Mar-2000  soren Fix doubled 'the's in comments.
 1.28 28-Jan-2000  chs remove a debug printf that has outlived its usefulness.
 1.27 19-Oct-1999  chs put various debugging printfs under #ifdef DEBUG.
 1.26 12-Sep-1999  chs branches: 1.26.2; 1.26.4; 1.26.6;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.
 1.25 22-Jul-1999  thorpej Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.24 22-Jul-1999  thorpej 0 -> FALSE in a few places.
 1.23 11-Apr-1999  chs add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.22 25-Mar-1999  mrg branches: 1.22.2;
remove now >1 year old pre-release message.
 1.21 25-Mar-1999  sommerfe Prevent deadlock cited in PR4629 from crashing the system. (copyout
and system call now just return EFAULT). A complete fix will
presumably have to wait for UBC and/or for vnode locking protocols to
be revamped to allow use of shared locks.
 1.20 24-Mar-1999  cgd after discussion with chuck, nuke pgo_attach from uvm_pagerops
 1.19 04-Mar-1999  chs fix printf arg types.
 1.18 29-Jan-1999  bouyer A small typo fix, + enclose "used_vnode_size = %qu" debug printf inside
#ifdef DEBUG/#endif
 1.17 04-Nov-1998  chs branches: 1.17.2;
we must unlock a vp's object's lock before calling vrele().
 1.16 18-Oct-1998  chs shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.
 1.15 13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.14 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.13 07-Jul-1998  thorpej branches: 1.13.2;
Add support for mmap'ing disk block devices.
 1.12 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.11 22-Jun-1998  sommerfe defopt for options FIFO
 1.10 05-May-1998  kleink Remove inclusions of syscall (and syscall argument) related header files;
we don't need them here.
 1.9 11-Mar-1998  chuck bug fix: when doing uvm_vnp_sync() actually skip over blocked uvn's so
that we don't try and sync them later. should get rid of the
"uvm_vnp_sync: dying vnode on sync list" related warnings that were
occuring during a "make install."
 1.8 09-Mar-1998  mrg KNF.
 1.7 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.6 19-Feb-1998  thorpej Include the NFS option header.
 1.5 18-Feb-1998  mrg bug fix from chuck: uvm_vnp_terminate panic when /sbin/init was unlinked
 1.4 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.13.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.17.2.8 02-Jun-1999  chs implement UFP_NORDONLY.
 1.17.2.7 30-May-1999  chs add uvm_vnp_asyncget() and uvn_doasyncget() for doing readahead.
in uvm_vnp_sync(), use an async uvn_flush() again.
remove uvm_vnp_setpageblknos().
 1.17.2.6 30-Apr-1999  chs fix uvn_flush() to actually wait for sync i/os.
fix uvm_vnp_setpageblknos() to deal with big ranges.
fix uvm_vnp_zerorange() to not be just totally wrong.
also, use the new ubc_alloc() interface.
 1.17.2.5 29-Apr-1999  chs temporarily make uvm_vnp_sync() use sync io.
make uvm_vnp_zerorange() deal with ranges larger than 1 ubc window.
 1.17.2.4 09-Apr-1999  chs fix vnode reference-counting in uvm_vnp_sync().
 1.17.2.3 25-Feb-1999  chs delete non-UBC parts of uvn_attach(), uvn_reference(), uvn_detach(),
uvm_vnp_terminate(), uvm_vnp_uncache().
add uvn_findpages(), for looking-up/allocating multiple pages.
allow async vnode pageouts.
lock the writeable list when remove vnodes from it too.
rename uvm_vnp_relocate() to uvm_vnp_setpageblknos() and expand
its functionality to optionally zero the pages.
add uvm_vnp_zerorange(), incomplete but does enough for the moment.
use LIST_* macros and SLOCK_{,UN}LOCKED.
 1.17.2.2 16-Nov-1998  chs uvn_put() now unlocks the uobj before calling VOP_PUTPAGES().
move the important line of uvm_vnp_setsize() outside the debug ifdef.
adjust other debugging code.
 1.17.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.22.2.2 31-Jan-2000  he Pull up revision 1.28 (via patch, requested by chs):
Remove a debug printf that has outlived its usefulness.
 1.22.2.1 16-Apr-1999  chs branches: 1.22.2.1.2;
pull up 1.22 -> 1.23:
add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.22.2.1.2.9 31-Aug-1999  perseant Rudimentary support for LFS under UBC:

- LFS-specific VOP_BALLOC and VOP_PUTPAGES vnode ops.

- getblk VREG panic #ifdef'd out (can be reinstated when Ifile is
internalized and Ifile can be made another type from VREG)

- interface to VOP_PUTPAGES changed to pass all pager flags, not
just sync. FS putpages routines must know about the pager flags.

- new LFS magic disk address, -2 ("unwritten"), meaning accounted for
but not assigned to a fixed disk location (since LFS does these two
things separately, and the previous accounting method using buffer
headers no longer will work). Changed references to (foo == (daddr_t)-1)
to (foo < 0). Since disk drivers reject all addresses < 0, this should
not present a problem for other FSs.
 1.22.2.1.2.8 11-Aug-1999  chs fix uvn_flush() to work now that vnode offsets are signed.
 1.22.2.1.2.7 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.22.2.1.2.6 06-Aug-1999  chs clean up some leftovers.
 1.22.2.1.2.5 02-Aug-1999  thorpej Update from trunk.
 1.22.2.1.2.4 31-Jul-1999  chs in uvn_findpage(), ignore any offsets where the return page pointer
is non-NULL.
 1.22.2.1.2.3 11-Jul-1999  chs remove uvm_vnp_uncache(), it's not needed anymore.
use uvm_errno2vmerror().
put uvm_vnp_zerorange() back the way it was before,
it was right the first time.
 1.22.2.1.2.2 04-Jul-1999  chs remove UVM_VNODE_* flags in favor of V* vnode flags.
rewrite uvm_vnp_zerorange(). it's still wrong, but it's closer.
update stuff to use buf instead of uvm_aiobuf.
uvm_vnp_asyncget() can now determine the blocksize from the vnode
rather than needing it to be passed in.
 1.22.2.1.2.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.26.6.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.26.4.1 15-Nov-1999  fvdl Sync with -current
 1.26.2.8 23-Mar-2001  bouyer Make sure files that shouldn't change are identical to HEAD.
 1.26.2.7 12-Mar-2001  bouyer Sync with HEAD.
 1.26.2.6 11-Feb-2001  bouyer Sync with HEAD.
 1.26.2.5 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.26.2.4 05-Jan-2001  bouyer Sync with HEAD
 1.26.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.26.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.26.2.1 20-Oct-1999  thorpej Sync w/ trunk.
 1.33.4.1 15-Nov-2001  he Apply patch (requested by chs):
Make sure to initialize uio_procp in uvn_io(). Fixes kernel
crash problem, reported in PR#14185.
 1.46.2.13 17-Sep-2002  nathanw Catch up to -current.
 1.46.2.12 16-Jul-2002  nathanw Whitespace.
 1.46.2.11 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.46.2.10 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.46.2.9 20-Jun-2002  nathanw Catch up to -current.
 1.46.2.8 08-Jan-2002  nathanw Catch up to -current.
 1.46.2.7 14-Nov-2001  nathanw Catch up to -current.
 1.46.2.6 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.46.2.5 21-Sep-2001  nathanw Catch up to -current.
 1.46.2.4 24-Aug-2001  nathanw Catch up with -current.
 1.46.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.46.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.46.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.50.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.50.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.50.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.50.2.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.51.2.2 01-Oct-2001  fvdl Catch up with -current.
 1.51.2.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.54.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.57.8.2 30-May-2002  gehenna Catch up with -current.
 1.57.8.1 16-May-2002  gehenna Replace the direct-access to devsw table with calling devsw APIs.
 1.62.2.7 11-Dec-2005  christos Sync with head.
 1.62.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.62.2.5 17-Jan-2005  skrll Sync with HEAD.
 1.62.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.62.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.62.2.2 03-Aug-2004  skrll Sync with HEAD
 1.62.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.66.8.2 29-Nov-2005  yamt sync with head.
 1.66.8.1 19-Nov-2005  yamt - as read-ahead context is per-vnode now,
there are less reasons to make VOP_READ call uvm_ra_request explicitly.
move it to pager (uvn_get) so that it can handle accesses via mmap as well.
- pass advice to pager via ubc.
- tweak DPRINTF.

XXX can be disturbed by PGO_LOCKED.

XXX it's controversial where it should be done.
(uvm_fault, uvn_get or genfs_getpages.)
 1.66.2.7 21-Jan-2008  yamt sync with head
 1.66.2.6 07-Dec-2007  yamt sync with head
 1.66.2.5 27-Oct-2007  yamt sync with head.
 1.66.2.4 03-Sep-2007  yamt sync with head.
 1.66.2.3 26-Feb-2007  yamt sync with head.
 1.66.2.2 30-Dec-2006  yamt sync with head.
 1.66.2.1 21-Jun-2006  yamt sync with head.
 1.69.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.69.10.1 08-Mar-2006  elad Adapt to kernel authorization changes.
 1.69.8.3 11-Aug-2006  yamt sync with head
 1.69.8.2 24-May-2006  yamt sync with head.
 1.69.8.1 12-Mar-2006  yamt - change the way to account read-ahead stats.
- fix UVM_PQFLAGBITS.
 1.69.6.1 01-Jun-2006  kardel Sync with head.
 1.69.4.1 09-Sep-2006  rpaulo sync with head
 1.72.4.2 12-Jan-2007  ad Sync with head.
 1.72.4.1 18-Nov-2006  ad Sync with head.
 1.73.2.2 10-Dec-2006  yamt sync with head.
 1.73.2.1 22-Oct-2006  yamt sync with head
 1.77.2.1 17-Feb-2007  tron Apply patch (requested by chs in ticket #422):
- Fix various deadlock problems with nullfs and unionfs.
- Speed up path lookups by upto 25%.
 1.78.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.78.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.81.4.1 11-Jul-2007  mjf Sync with head.
 1.81.2.10 09-Oct-2007  ad Sync with head.
 1.81.2.9 20-Aug-2007  ad Sync with HEAD.
 1.81.2.8 15-Jul-2007  ad Sync with head.
 1.81.2.7 15-Jul-2007  ad Sync with head.
 1.81.2.6 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.81.2.5 09-Jun-2007  ad Sync with head.
 1.81.2.4 13-Apr-2007  ad - Make the devsw interface MP safe, and add some comments.
- Allow individual block/character drivers to be marked MP safe.
- Provide wrappers around the device methods that look up the
device, returning ENXIO if it's not found, and acquire the
kernel lock if needed.
 1.81.2.3 13-Apr-2007  ad - Fix a (new) bug where vget tries to acquire freed vnodes' interlocks.
- Minor locking fixes.
 1.81.2.2 21-Mar-2007  ad Acquire the kernel lock in the VOP_* wrappers and the socket ops.
 1.81.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.83.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.84.4.3 03-Dec-2007  joerg Sync with HEAD.
 1.84.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.84.4.1 04-Aug-2007  jmcneill Sync with HEAD.
 1.85.6.2 04-Aug-2007  pooka Use VSIZENOTSET only in KASSERTs
 1.85.6.1 04-Aug-2007  pooka file uvm_vnode.c was added on branch matt-mips64 on 2007-08-04 09:42:59 +0000
 1.85.4.1 14-Oct-2007  yamt sync with head.
 1.85.2.2 09-Jan-2008  matt sync with HEAD
 1.85.2.1 06-Nov-2007  matt sync with HEAD
 1.87.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.87.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.89.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.89.2.2 18-Dec-2007  ad Lock readahead context using the associated object's lock.
 1.89.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.90.28.4 29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.90.28.3 03-Jun-2011  matt Restore $NetBSD$
 1.90.28.2 03-Jun-2011  matt Rework page free lists to be sorted by color first rather than free_list.
Kept per color PGFL_* counter in each page free list.
Minor cleanups.
 1.90.28.1 25-May-2011  matt Make uvm_map recognize UVM_FLAG_COLORMATCH which tells uvm_map that the
'align' argument specifies the starting color of the KVA range to be returned.

When calling uvm_km_alloc with UVM_KMF_VAONLY, also specify the starting
color of the kva range returned (UMV_KMF_COLORMATCH) and pass those to
uvm_map.

In uvm_pglistalloc, make sure the pages being returned have sequentially
advancing colors (so they can be mapped in a contiguous address range).
Add a few missing UVM_FLAG_COLORMATCH flags to uvm_pagealloc calls.

Make the socket and pipe loan color-safe.

Make the mips pmap enforce strict page color (color(VA) == color(PA)).
 1.90.10.2 11-Mar-2010  yamt sync with head
 1.90.10.1 19-Aug-2009  yamt sync with head.
 1.93.8.1 08-Feb-2011  bouyer Sync with HEAD
 1.93.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.93.4.4 31-May-2011  rmind sync with head
 1.93.4.3 05-Mar-2011  rmind sync with head
 1.93.4.2 17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.93.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.93.2.6 21-Nov-2010  uebayasi uvm_findpage_xip: A few UVMHIST logs.
 1.93.2.5 20-Nov-2010  uebayasi uvn_findpage_xip: This is responsible to return a page with marked
as "busy".
 1.93.2.4 18-Nov-2010  uebayasi Make XIP pager use cdev_mmap() instead of struct vm_physseg.
 1.93.2.3 16-Nov-2010  uebayasi Factor out the part which lookups physical page "identity" from
UVM object, into sys/uvm/uvm_vnode.c:uvn_findpage_xip(). Eventually
this will become a call to cdev UVM object pager.
 1.93.2.2 25-Aug-2010  uebayasi Actually make this build with options XIP.
 1.93.2.1 11-Feb-2010  uebayasi uvn_get: For XIP vnodes, skip read-ahead, because it's pointless.
 1.95.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.97.8.1 11-Jun-2012  riz Pull up following revision(s) (requested by martin in ticket #301):
sys/uvm/uvm_vnode.c: revision 1.98
tests/lib/libc/sys/t_mmap.c: revision 1.3
tests/lib/libc/sys/t_mmap.c: revision 1.4
tests/lib/libc/sys/t_mmap.c: revision 1.5
tests/lib/libc/sys/t_mmap.c: revision 1.6
Only use generic readahead on VREG vnodes, the space used to store the
context is not valid on other types.
Prevents the crash reported in PR kern/38889, but does not fix the
mmap of block devices, more work is needed (no size on VBLK vnodes).
Do not skip the block device mmap test, as it does not crash
the kernel any more. Mark it as expected failure instead.
mmap_block:
do not use a hardcoded block device list, but query the kernel for attached
disks instead, then try to mmap the raw partition.
Use atf_tc_skip().
A test case for serious PR kern/38889: crash on open/mmap/close of block
device. The test case is skipped for the time being as it replicates the
panic described in the PR (tested on NetBSD/amd64 6.0 BETA).
 1.97.6.1 02-Jun-2012  mrg sync to latest -current.
 1.97.2.9 02-Nov-2012  yamt uvn_findpage: fix dense case. add comments.
 1.97.2.8 30-Oct-2012  yamt sync with head
 1.97.2.7 01-Aug-2012  yamt - fix integrity sync.
putpages for integrity sync (fsync, msync with MS_SYNC, etc) should not
skip pages being written back by other threads.

- adapt to radix tree tag api changes.
 1.97.2.6 01-Aug-2012  yamt fix a typo in a comment.
 1.97.2.5 17-Feb-2012  yamt byebye PG_HOLE as it turned out to be unnecessary.
 1.97.2.4 18-Jan-2012  yamt - bug fixes
- minor optimizations
- assertions
- comments
 1.97.2.3 20-Dec-2011  yamt don't inline uvn_findpages in genfs_io.
 1.97.2.2 26-Nov-2011  yamt - uvm_page_array_fill: add some more parameters
- uvn_findpages: use gang-lookup
- genfs_putpages: re-enable backward clustering
- mechanical changes after the recent radixtree.h api changes
 1.97.2.1 02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.99.16.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.99.16.1 22-Sep-2015  skrll Sync with HEAD
 1.99.2.1 03-Dec-2017  jdolecek update from HEAD
 1.102.10.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.103.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.104.2.2 29-Feb-2020  ad Sync with head.
 1.104.2.1 17-Jan-2020  ad Sync with head.
 1.117.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.11 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.10 27-Nov-2000  chs branches: 1.10.2; 1.10.4; 1.10.6;
Initial integration of the Unified Buffer Cache project.
 1.9 26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.8 21-Jun-1999  thorpej branches: 1.8.2;
Protect prototypes, certain macros, and inlines from userland.
 1.7 25-Mar-1999  mrg branches: 1.7.4;
remove now >1 year old pre-release message.
 1.6 13-Aug-1998  eeh branches: 1.6.2;
Merge paddr_t changes into the main branch.
 1.5 09-Mar-1998  mrg branches: 1.5.2;
KNF.
 1.4 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.3 07-Feb-1998  mrg restore rcsids
 1.2 06-Feb-1998  thorpej RCS ID police.
 1.1 05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.5.2.1 30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.6.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.7.4.7 09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.7.4.6 04-Jul-1999  chs remove UVM_VNODE_* flags, use the V* vnode flags instead.
 1.7.4.5 02-Jul-1999  thorpej Remove an #ifdef UBC for the new uvm_vnode/vnode flags; just always use them
instead, and g/c the old ones. (Boy, this really confused me at first! :-)
 1.7.4.4 02-Jul-1999  thorpej Oops, fix botch in previous.
 1.7.4.3 02-Jul-1999  thorpej Fix merge botch.
 1.7.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.7.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.8.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.8.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.6.1 01-Oct-2001  fvdl Catch up with -current.
 1.10.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.2.1 28-Feb-2002  nathanw Catch up to -current.
 1.80 06-May-2024  skrll Fix 32bit UVMHIST builds
 1.79 03-May-2024  skrll More debug.
 1.78 18-Apr-2024  skrll Fix types in pmap_page_clear_attributes so that the top bits of
the u_long mdpg_attrs aren't dropped giving atomic_cas_ulong no
chance of completing if any of the top bits is set.

Update pmap_page_set_attributes for consistency.

An ATF test run completed for me with this fix.

port-riscv/58006: ATF tests no longer complete on riscv-riscv64
 1.77 23-Mar-2024  skrll Default pmap_stealdebug to false
 1.76 05-Mar-2024  skrll Change the PMAP_STEAL_MEMORY debug output from aprint_debug.

The new printfs are conditional on pmap_stealdebug and the DEBUG compile
option. The former defaults to true, but can be changed at a boot -d ddb
prompt.
 1.75 26-Feb-2023  skrll ci_data.cpu_kcpuset -> ci_kcpuset

NFCI.
 1.74 03-Nov-2022  skrll branches: 1.74.2;
Provide MI PMAP support on AARCH64
 1.73 02-Nov-2022  skrll KNF
 1.72 28-Oct-2022  skrll MI PMAP EFI_RUNTIME support
 1.71 27-Oct-2022  skrll No need to hold the pmap_tlb_miss_lock when calling pmap_segtab_destroy
 1.70 27-Oct-2022  skrll Rename pm_count to pm_refcnt
 1.69 26-Oct-2022  skrll MI PMAP hardware page table walker support.

This is based on code given to me by Matt Thomas a long time ago with
many updates and bugs fixes from me.
 1.68 23-Oct-2022  skrll Correct the pmap_kstart_segtab entry in pmap_kern_segtab
 1.67 15-Sep-2022  skrll whitespace - remove spaces before tabs
 1.66 12-Sep-2022  skrll A simplification and some minor whitespace
 1.65 07-May-2022  rin Introduce PMAP_PV_TRACK_ONLY_STUBS option, by which only empty stubs for
global functions in pmap_pvt.h are provided, instead of real support for
PV tracking.

Necessary for powerpc: Only one sub-arch (oea) has PV tracking support.
Others (booke/ibm4xx) do not at the moment (probably never for ibm4xx),
but __HAVE_PMAP_PV_TRACK is necessary, so that modules can be shared by
all of sub-archs.
 1.64 09-Apr-2022  riastradh sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.
 1.63 12-Mar-2022  riastradh sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.
 1.62 17-Apr-2021  mrg remove KERNHIST_INIT_STATIC(). it stradles the line between usable
early in boot and broken early in boot by requiring a partly static
structure with another structure that must be present by the time
any uses are performed. theoretically platform code could allocate
a chunk while seting up memory and assign it here, giving a dynamic
sizing for the entry list, but the reality is that all users have
a statically allocated entry list as well.

the existing KERNHIST_LINK_STATIC() is used in conjunction with
KERNHIST_INITIALIZER() instead.

this stops a NULL pointer deref when the _LOG() macro is called
before the storage is linked in, which happens with GCC 10 on OCTEON
with UVMHIST enabled, crashing in very early kernel init.
 1.61 19-Mar-2021  skrll branches: 1.61.2;
Support pmap_growkernel and KASAN shadow mapping of the new KVA.

Neither mips nor ppc booke actually use pmap_growkernel (at present).

Thanks to rin@ for testing a similar patch on ppc booke.
 1.60 13-Mar-2021  skrll s/pfi_faultpte/&p/ for consistency with arm / other uses of ptep
 1.59 13-Mar-2021  skrll Don't use %jx for 0 or 1 - just use %jd in UVMHIST_LOG format.
 1.58 20-Dec-2020  skrll Support __HAVE_PMAP_PV_TRACK in sys/uvm/pmap based pmaps (aka common pmap)
 1.57 08-Oct-2020  skrll branches: 1.57.2;
%#jx vs %jx consistency in UVMHIST_LOG
 1.56 24-Sep-2020  skrll Whitespace
 1.55 20-Aug-2020  mrg move pmap segtab history into a new history of only 1000 entries,
but will overflow much slower than the main pmap history.

move various debug info into kernhist. make pte array checker
into an array and use it in pmap_segtab_release() and
pmap_pte_reserve(). move check before MD callback(), incase it
wants to change ptes for some reason (they're passed in, but
this callback is currently always NULL.)

clean up some history logs to reduce the number of lines required.
 1.54 19-Aug-2020  simonb Remove trailing \n from UVMHIST_LOG() format strings.
 1.53 11-Aug-2020  skrll More UVMHIST_LOG. Remove some commented output printfs.
 1.52 11-Aug-2020  skrll Fix a comment
 1.51 07-Aug-2020  skrll Provide a pmap_segtab_deactivate for symmetry with pmap_segtab_activate
and use it in pmap_deactivate

Call pmap_md_xtab_{,de}activate from pmap_segtab_{,de}activate to be used
for PMAP_HWPAGEWALKER and any caches ops that might be required.

Provide empty (for now) pmap_md_xtab_{,de}activate functions on the
platforms that use sys/uvm/pmap
 1.50 18-Jul-2020  skrll Always call pmap_segtab_activate in pmap_activate. pmap_segtab_activate
does the right thing if called with non-curlwp.
 1.49 12-Apr-2020  skrll Use UVMHIST_CALLARGS
 1.48 14-Mar-2020  ad branches: 1.48.2;
pmap_remove_all(): Return a boolean value to indicate the behaviour. If
true, all mappings have been removed, the pmap is totally cleared out, and
UVM can then avoid doing the work to call pmap_remove() for each map entry.
If false, either nothing has been done, or some helpful arch-specific voodoo
has taken place.
 1.47 12-Mar-2020  thorpej pmap_tlb_miss_lock needs to be globally visible.
 1.46 11-Mar-2020  thorpej With DEBUG defined, it's possible to execute a TLB-vs-segmap consistency
check from a (soft) interrupt handler. But if a platform does not otherwise
require the pmap_tlb_miss_lock, then where will be a brief window of
inconsistency that, while harmless, will still fire an assertion in the
consistency check.

Fix this with the following changes:
1- Refactor the pmap_tlb_miss_lock into MI code and rename it from
pmap_tlb_miss_lock_{enter,exit}() to pmap_tlb_miss_lock_{enter,exit}().
MD code can still define the "md" hooks as necessary, and if so, will
override the common implementation.
2- Provde a pmap_bootstrap_common() function to perform common pmap bootstrap
operations, namely initializing the pmap_tlb_miss_lock if it's needed.
If MD code overrides the implementation, it's responsible for initializing
its own lock.
3- Call pmap_bootstrap_common() from the mips, powerpc booke, and riscv
pmap_bootstrap() routines. (This required adding one for riscv.)
4- Switch powerpc booke to the common pmap_tlb_miss_lock.
5- Enable pmap_tlb_miss_lock if DEBUG is defined, even if it's not otherwise
required.

PR port-mips/55062 (Failed assertion in pmap_md_tlb_check_entry())
 1.45 18-Dec-2019  skrll Remove duplicate #includes
 1.44 20-Oct-2019  skrll Define and use VM_PAGEMD_PVLIST_EMPTY_P
 1.43 20-Oct-2019  skrll Remove KASSERT(!VM_PAGEMD_PVLIST_LOCKED_P(mdpg)) - can only assert that it
is owned
 1.42 12-Jul-2019  skrll Provide and use PV_ISKENTER_P. NFCI.
 1.41 19-Jun-2019  skrll Make a comment generic and not MIPS specific
 1.40 30-Oct-2017  pgoyette branches: 1.40.2; 1.40.6;
Remove unneeded casts to (uintptr_t). This is already taken care of in
the xxxHIST_LOG() macros.

No need to pull-up to -8 - the extra cast really won't hurt anything.
 1.39 30-Oct-2017  pgoyette And replace an instance of "%p" conversion with "%#jx"
 1.38 30-Oct-2017  kre Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
 1.37 28-Oct-2017  pgoyette Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.36 07-Sep-2017  skrll There's no need to call pmap_tlb_invalidate_addr if pmap_remove_all was
called and PMAP_DEFERRED_ACTIVATE is set.
 1.35 24-Jun-2017  skrll Use pte_set
 1.34 12-May-2017  skrll branches: 1.34.2;
Sprinkle some KASSERTs
 1.33 07-May-2017  skrll Fix a comment
 1.32 28-Apr-2017  skrll Remove unused LNAME macro
 1.31 28-Apr-2017  skrll Fix a UVMHIST_LOG after the "%s" removal
 1.30 22-Apr-2017  skrll branches: 1.30.2;
Improve a comment
 1.29 22-Apr-2017  skrll Trailing whitespace
 1.28 02-Mar-2017  mrg avoid using %s in UVMHIST.
 1.27 23-Dec-2016  skrll branches: 1.27.2;
PHYSMEM -> PHYSSEG to fix build
 1.26 23-Dec-2016  cherry "Make NetBSD great again!"

Introduce uvm_hotplug(9) to the kernel.

Many thanks, in no particular order to:

TNF, for funding the project.

Chuck Silvers - for multiple API reviews and feedback.
Nick Hudson - for testing on multiple architectures and bugfix patches.
Everyone who helped with boot testing.

KeK (http://www.kek.org.in) for hosting the primary developers.
 1.25 01-Dec-2016  mrg extend the pmap_activate/pmap_deactivate UVMHIST logs to include the
pid, lid, and either l_name or p_comm.
 1.24 05-Oct-2016  skrll Move some code before pmap_enter_pv in pmap_enter so that when we are
re-mapping a VA to a new PA the old mapping is removed first. This means
the cache alias code need to do less work and works better with the last
va tracking.
 1.23 30-Sep-2016  skrll Increment resident_count if we're remapping onto new PA as
pmap_remove -> pmap_pte_remote will decrement it
 1.22 16-Sep-2016  matt When removing a page, make sure to clear its execness regardless of whether
the page is clean or dirty. This fixes the problem of execpages leaking
into the freepage lists.
 1.21 20-Aug-2016  mrg put a variable under the #ifdef it's only used in.
 1.20 18-Aug-2016  matt Don't track kenter_pa/kremove PVs unless we are worrying about cache aliasing.
 1.19 05-Aug-2016  jakllsch Only include `static inline pmap_asid_check()` if it might be used.

Should fix HEAD-llvm evbppc autobuild.
 1.18 14-Jul-2016  skrll branches: 1.18.2;
Spell PMAP_TLB_NEED_SHOOTDOWN correctly
 1.17 14-Jul-2016  skrll Trailing whitespace
 1.16 11-Jul-2016  maya Fix build by removing accidential duplicate line.
 1.15 11-Jul-2016  matt Changes so that MIPS can use the common pmap.
Change/augment the virtual cache alias callbacks.
 1.14 07-Jul-2016  msaitoh KNF. Remove extra spaces. No functional change.
 1.13 05-Nov-2015  pgoyette Remove unnecessary #include for sys/shm.h - there's nothing here that needs
anything from there.
 1.12 11-Jun-2015  matt Add virtual_start to pmap_limits. This allows MD to steal address space
before pmap_bootstrap.
 1.11 03-Feb-2015  nonaka Disable pmap_md_tlb_check_entry, when MP.
 1.10 26-Jan-2015  nonaka Avoid race condition between PTE update and TLB miss walk.
 1.9 05-Jan-2015  nonaka Use PMAP_TLB_MAX instead of MAXCPUS.
 1.8 25-Dec-2014  nonaka fix build failure when UVMHIST is defined.
 1.7 24-Dec-2014  nonaka fix compile failure.
 1.6 22-Dec-2014  nonaka pmap->pm_active and pmap->pm_onproc must be destroyed.
 1.5 19-Dec-2014  nonaka Initialize pmap->pm_active and pmap->pm_onproc.
Avoid "panic: kernel diagnostic assertion "!pmap_tlb_intersecting_onproc_p(pm, ti)" failed: file "/usr/src/sys/uvm/pmap/pmap_tlb.c", line 762".
 1.4 25-Feb-2014  martin branches: 1.4.6;
Mark a potentially unused variable
 1.3 22-Jul-2013  matt In the non-MP case, just initialize onproc to NULL.
 1.2 17-Jul-2013  matt Make this kcpuset_t instead of the private __cpuset_t
Add improvements for single TLB implementation (PPC, ARM).
 1.1 03-Oct-2012  christos branches: 1.1.2; 1.1.4; 1.1.10;
move from common/pmap/tlb -> uvm/pmap
 1.1.10.1 23-Jul-2013  riastradh sync with HEAD
 1.1.4.2 18-May-2014  rmind sync with head
 1.1.4.1 28-Aug-2013  rmind sync with head
 1.1.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.2.2 30-Oct-2012  yamt sync with head
 1.1.2.1 03-Oct-2012  yamt file pmap.c was added on branch yamt-pagecache on 2012-10-30 17:23:03 +0000
 1.4.6.7 28-Aug-2017  skrll Sync with HEAD
 1.4.6.6 05-Feb-2017  skrll Sync with HEAD
 1.4.6.5 05-Dec-2016  skrll Sync with HEAD
 1.4.6.4 05-Oct-2016  skrll Sync with HEAD
 1.4.6.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.4.6.2 22-Sep-2015  skrll Sync with HEAD
 1.4.6.1 06-Apr-2015  skrll Sync with HEAD
 1.18.2.5 26-Apr-2017  pgoyette Sync with HEAD
 1.18.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.18.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.18.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.18.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.27.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.30.2.3 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.30.2.2 11-May-2017  pgoyette Sync with HEAD
 1.30.2.1 02-May-2017  pgoyette Sync with HEAD - tag prg-localcount2-base1
 1.34.2.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.40.6.3 21-Apr-2020  martin Sync with HEAD
 1.40.6.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.40.6.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.40.2.2 03-Dec-2017  jdolecek update from HEAD
 1.40.2.1 30-Oct-2017  jdolecek file pmap.c was added on branch tls-maxphys on 2017-12-03 11:39:23 +0000
 1.48.2.1 20-Apr-2020  bouyer Sync with HEAD
 1.57.2.2 03-Apr-2021  thorpej Sync with HEAD.
 1.57.2.1 03-Jan-2021  thorpej Sync w/ HEAD.
 1.61.2.1 17-Apr-2021  thorpej Sync with HEAD.
 1.74.2.1 19-Apr-2024  martin Pull up following revision(s) (requested by skrll in ticket #671):

sys/uvm/pmap/pmap.c: revision 1.78
sys/uvm/pmap/pmap.h: revision 1.27

Fix types in pmap_page_clear_attributes so that the top bits of
the u_long mdpg_attrs aren't dropped giving atomic_cas_ulong no
chance of completing if any of the top bits is set.

Update pmap_page_set_attributes for consistency.

An ATF test run completed for me with this fix.

port-riscv/58006: ATF tests no longer complete on riscv-riscv64
 1.28 25-Nov-2024  skrll Sprinkle #ifdef _KERNEL
 1.27 18-Apr-2024  skrll branches: 1.27.2;
Fix types in pmap_page_clear_attributes so that the top bits of
the u_long mdpg_attrs aren't dropped giving atomic_cas_ulong no
chance of completing if any of the top bits is set.

Update pmap_page_set_attributes for consistency.

An ATF test run completed for me with this fix.

port-riscv/58006: ATF tests no longer complete on riscv-riscv64
 1.26 03-Nov-2022  skrll branches: 1.26.2;
_KERNEL_OPT protection
 1.25 03-Nov-2022  skrll Provide MI PMAP support on AARCH64
 1.24 27-Oct-2022  skrll Rename pm_count to pm_refcnt
 1.23 27-Oct-2022  skrll Fix the crash(1) build for mips platforms
 1.22 26-Oct-2022  skrll MI PMAP hardware page table walker support.

This is based on code given to me by Matt Thomas a long time ago with
many updates and bugs fixes from me.
 1.21 07-May-2022  rin Introduce PMAP_PV_TRACK_ONLY_STUBS option, by which only empty stubs for
global functions in pmap_pvt.h are provided, instead of real support for
PV tracking.

Necessary for powerpc: Only one sub-arch (oea) has PV tracking support.
Others (booke/ibm4xx) do not at the moment (probably never for ibm4xx),
but __HAVE_PMAP_PV_TRACK is necessary, so that modules can be shared by
all of sub-archs.
 1.20 19-Mar-2021  skrll Support pmap_growkernel and KASAN shadow mapping of the new KVA.

Neither mips nor ppc booke actually use pmap_growkernel (at present).

Thanks to rin@ for testing a similar patch on ppc booke.
 1.19 21-Dec-2020  skrll Remove variable in function declaration argument
 1.18 20-Dec-2020  skrll Support __HAVE_PMAP_PV_TRACK in sys/uvm/pmap based pmaps (aka common pmap)
 1.17 20-Aug-2020  mrg branches: 1.17.2;
move pmap segtab history into a new history of only 1000 entries,
but will overflow much slower than the main pmap history.

move various debug info into kernhist. make pte array checker
into an array and use it in pmap_segtab_release() and
pmap_pte_reserve(). move check before MD callback(), incase it
wants to change ptes for some reason (they're passed in, but
this callback is currently always NULL.)

clean up some history logs to reduce the number of lines required.
 1.16 07-Aug-2020  skrll Provide a pmap_segtab_deactivate for symmetry with pmap_segtab_activate
and use it in pmap_deactivate

Call pmap_md_xtab_{,de}activate from pmap_segtab_{,de}activate to be used
for PMAP_HWPAGEWALKER and any caches ops that might be required.

Provide empty (for now) pmap_md_xtab_{,de}activate functions on the
platforms that use sys/uvm/pmap
 1.15 08-Jul-2020  skrll Comment updates
 1.14 15-Mar-2020  rin Fix build for ports using uvm/pmap; pmap_remove_all() returns a boolean.
 1.13 11-Mar-2020  thorpej With DEBUG defined, it's possible to execute a TLB-vs-segmap consistency
check from a (soft) interrupt handler. But if a platform does not otherwise
require the pmap_tlb_miss_lock, then where will be a brief window of
inconsistency that, while harmless, will still fire an assertion in the
consistency check.

Fix this with the following changes:
1- Refactor the pmap_tlb_miss_lock into MI code and rename it from
pmap_tlb_miss_lock_{enter,exit}() to pmap_tlb_miss_lock_{enter,exit}().
MD code can still define the "md" hooks as necessary, and if so, will
override the common implementation.
2- Provde a pmap_bootstrap_common() function to perform common pmap bootstrap
operations, namely initializing the pmap_tlb_miss_lock if it's needed.
If MD code overrides the implementation, it's responsible for initializing
its own lock.
3- Call pmap_bootstrap_common() from the mips, powerpc booke, and riscv
pmap_bootstrap() routines. (This required adding one for riscv.)
4- Switch powerpc booke to the common pmap_tlb_miss_lock.
5- Enable pmap_tlb_miss_lock if DEBUG is defined, even if it's not otherwise
required.

PR port-mips/55062 (Failed assertion in pmap_md_tlb_check_entry())
 1.12 01-Jun-2019  maxv Misc changes in RISC-V. Start changing the memory layout, too.
 1.11 20-May-2019  skrll Usee __BIT()
 1.10 20-May-2019  skrll Trailing whitespace
 1.9 24-Jun-2017  skrll branches: 1.9.4; 1.9.8;
Trailing whitespace
 1.8 24-Jun-2017  skrll Multiple inclusion protection define consistency
 1.7 11-Jul-2016  matt Changes so that MIPS can use the common pmap.
Change/augment the virtual cache alias callbacks.
 1.6 07-Jul-2016  msaitoh KNF. Remove extra spaces. No functional change.
 1.5 11-Jun-2015  matt Add virtual_start to pmap_limits. This allows MD to steal address space
before pmap_bootstrap.
 1.4 18-Mar-2014  riastradh branches: 1.4.6;
Merge riastradh-drm2 to HEAD.
 1.3 17-Jul-2013  matt Make this kcpuset_t instead of the private __cpuset_t
Add improvements for single TLB implementation (PPC, ARM).
 1.2 02-Jul-2013  matt branches: 1.2.2;
Split tlb related stuff into pmap_tlb.h so that can be used for ASID mgmt
for non-soft TLB pmaps.
 1.1 03-Oct-2012  christos branches: 1.1.2; 1.1.4;
move from common/pmap/tlb -> uvm/pmap
 1.1.4.1 28-Aug-2013  rmind sync with head
 1.1.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.2.2 30-Oct-2012  yamt sync with head
 1.1.2.1 03-Oct-2012  yamt file pmap.h was added on branch yamt-pagecache on 2012-10-30 17:23:03 +0000
 1.2.2.1 23-Jul-2013  riastradh sync with HEAD
 1.4.6.4 28-Aug-2017  skrll Sync with HEAD
 1.4.6.3 05-Oct-2016  skrll Sync with HEAD
 1.4.6.2 09-Jul-2016  skrll Sync with HEAD
 1.4.6.1 22-Sep-2015  skrll Sync with HEAD
 1.9.8.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.9.8.1 10-Jun-2019  christos Sync with HEAD
 1.9.4.2 03-Dec-2017  jdolecek update from HEAD
 1.9.4.1 24-Jun-2017  jdolecek file pmap.h was added on branch tls-maxphys on 2017-12-03 11:39:23 +0000
 1.17.2.2 03-Apr-2021  thorpej Sync with HEAD.
 1.17.2.1 03-Jan-2021  thorpej Sync w/ HEAD.
 1.26.2.1 19-Apr-2024  martin Pull up following revision(s) (requested by skrll in ticket #671):

sys/uvm/pmap/pmap.c: revision 1.78
sys/uvm/pmap/pmap.h: revision 1.27

Fix types in pmap_page_clear_attributes so that the top bits of
the u_long mdpg_attrs aren't dropped giving atomic_cas_ulong no
chance of completing if any of the top bits is set.

Update pmap_page_set_attributes for consistency.

An ATF test run completed for me with this fix.

port-riscv/58006: ATF tests no longer complete on riscv-riscv64
 1.27.2.1 02-Aug-2025  perseant Sync with HEAD
 1.2 27-Apr-2023  skrll Correct a type.
 1.1 20-Apr-2023  skrll Provide a shared pmap_devmap implementation and convert all pmap_devmap
arrays to use DEVMAP_ENTRY{,_END}
 1.2 25-Nov-2024  skrll Sprinkle #ifdef _KERNEL
 1.1 20-Apr-2023  skrll branches: 1.1.6;
Provide a shared pmap_devmap implementation and convert all pmap_devmap
arrays to use DEVMAP_ENTRY{,_END}
 1.1.6.1 02-Aug-2025  perseant Sync with HEAD
 1.15 08-May-2022  rin Oops, correct misleading #endif comment.

It seems I need a cup of coffee...
 1.14 08-May-2022  rin Improve wording a bit in a comment for the previous.
 1.13 08-May-2022  rin For PMAP_PV_TRACK_ONLY_STUBS, comment out pmap_pv_{,un}track().

If modules call these functions, the result should be an
inconsistent state.

Such modules require real PV-tracking support, anyway.

The best we can do should be to make two symbols undefined, and
prevent these modules from loaded.
 1.12 07-May-2022  rin Introduce PMAP_PV_TRACK_ONLY_STUBS option, by which only empty stubs for
global functions in pmap_pvt.h are provided, instead of real support for
PV tracking.

Necessary for powerpc: Only one sub-arch (oea) has PV tracking support.
Others (booke/ibm4xx) do not at the moment (probably never for ibm4xx),
but __HAVE_PMAP_PV_TRACK is necessary, so that modules can be shared by
all of sub-archs.
 1.11 21-Jul-2021  skrll need <sys/param.h> for COHERENCY_UNIT

Minor KNF along the way.
 1.10 16-Mar-2020  ad branches: 1.10.8;
Use C99-ism to reduce ifdefs. Pointed out by christos@.
 1.9 16-Mar-2020  ad pmap_pv_track(): use PMAP_PAGE_INIT() otherwise the x86 pmap pukes.
 1.8 01-Jan-2020  martin Revert previous (include of sys/param.h) - the headers requiring this
have been fixed.
 1.7 28-Dec-2019  martin Add mising sys/param.h include (for COHERENCY_UNIT, now needed in uvm headers)
 1.6 18-Dec-2019  skrll KNF
 1.5 09-Dec-2019  riastradh Convert pmap_pvt to atomic_load/store.
 1.4 07-Dec-2019  jmcneill sys/atomic.h for membar_*
 1.3 07-Feb-2016  riastradh branches: 1.3.16; 1.3.20;
Use IPL_NONE for pserialized lock. Assert sleepable. (OOPS.)
 1.2 11-Nov-2015  skrll branches: 1.2.2;
Remove #if 0 / #endif includes
 1.1 11-Nov-2015  skrll Split out the pmap_pv_track stuff for use by others.

Discussed with riastradh@
 1.2.2.3 19-Mar-2016  skrll Sync with HEAD
 1.2.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.2.2.1 11-Nov-2015  skrll file pmap_pvt.c was added on branch nick-nhusb on 2015-12-27 12:10:19 +0000
 1.3.20.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.3.16.2 03-Dec-2017  jdolecek update from HEAD
 1.3.16.1 07-Feb-2016  jdolecek file pmap_pvt.c was added on branch tls-maxphys on 2017-12-03 11:39:23 +0000
 1.10.8.1 01-Aug-2021  thorpej Sync with HEAD.
 1.3 16-Feb-2022  riastradh pmap_pvt.h: Fix bogus include.
 1.2 24-Jun-2017  skrll branches: 1.2.4;
Multiple inclusion protection define consistency
 1.1 11-Nov-2015  skrll branches: 1.1.2;
Split out the pmap_pv_track stuff for use by others.

Discussed with riastradh@
 1.1.2.3 28-Aug-2017  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 11-Nov-2015  skrll file pmap_pvt.h was added on branch nick-nhusb on 2015-12-27 12:10:19 +0000
 1.2.4.2 03-Dec-2017  jdolecek update from HEAD
 1.2.4.1 24-Jun-2017  jdolecek file pmap_pvt.h was added on branch tls-maxphys on 2017-12-03 11:39:23 +0000
 1.33 23-Jul-2023  skrll KASSERT -> KASSERTMSG
 1.32 01-Jul-2023  skrll Fix build when KERNHIST defined, but not UVMHIST
 1.31 21-Dec-2022  skrll Rename pmap_md_pdetab_destroy to pmap_md_pdetab_fini to match
pmap_md_pdetab_init.

Call pmap_md_pdetab_fini from pmap_segtab_destroy.
 1.30 27-Oct-2022  skrll In pmap_pte_reserve ensure we're atomically swapping out an invalid entry
otherwise concurrent updates might both think they've updated the entry.
 1.29 26-Oct-2022  skrll MI PMAP hardware page table walker support.

This is based on code given to me by Matt Thomas a long time ago with
many updates and bugs fixes from me.
 1.28 25-Sep-2022  skrll Rename pmap_segtab_t *stp to stb for consistency with a future
pmap_pdetab_t *ptb. pmap_pdetab_t *ptp would be far too confusing.

NFC. Same code before and after.
 1.27 13-Mar-2021  skrll Consistently use %#jx instead of 0x%jx or just %jx in UVMHIST_LOG formats
 1.26 08-Oct-2020  skrll branches: 1.26.2;
%#jx vs %jx consistency in UVMHIST_LOG
 1.25 24-Sep-2020  skrll Whitespace
 1.24 10-Sep-2020  rin Cast pointer arguments of UVMHIST_CALLARGS() into uintptr_t.

Appease GCC9 -Wpointer-to-int-cast on ILP32 environments.
 1.23 22-Aug-2020  skrll Remove pte_zero_p and simply check against 0.
 1.22 22-Aug-2020  skrll Remove the #if defined(__mips_n64) && PAGE_SIZE == 8192 and make the
check MI - all PTs are PAGE_SIZE aligned
 1.21 22-Aug-2020  skrll Trailing whitespace
 1.20 20-Aug-2020  mrg fix hpcmips and evbppc builds (wrong type in panic()).
 1.19 20-Aug-2020  mrg move pmap segtab history into a new history of only 1000 entries,
but will overflow much slower than the main pmap history.

move various debug info into kernhist. make pte array checker
into an array and use it in pmap_segtab_release() and
pmap_pte_reserve(). move check before MD callback(), incase it
wants to change ptes for some reason (they're passed in, but
this callback is currently always NULL.)

clean up some history logs to reduce the number of lines required.
 1.18 18-Aug-2020  simonb Fix small tyop in a comment.
 1.17 18-Aug-2020  skrll Improve a panic message ever so slightly
 1.16 17-Aug-2020  mrg add pmaphist calls around seg_tab[] manipulation. hopefully will
help find what causes this:

panic: pmap_segtab_alloc: pm_segtab.seg_tab[1010] != 0 (0x980000004eeb6068): from free list
 1.15 07-Aug-2020  skrll Provide a pmap_segtab_deactivate for symmetry with pmap_segtab_activate
and use it in pmap_deactivate

Call pmap_md_xtab_{,de}activate from pmap_segtab_{,de}activate to be used
for PMAP_HWPAGEWALKER and any caches ops that might be required.

Provide empty (for now) pmap_md_xtab_{,de}activate functions on the
platforms that use sys/uvm/pmap
 1.14 24-Feb-2020  rin 0x%p --> %p for non-external codes.
 1.13 18-Dec-2019  skrll branches: 1.13.2;
KNF
 1.12 14-Dec-2019  ad Use pageq.list instead of listq.list.
 1.11 20-Oct-2019  skrll Whitespace
 1.10 23-Sep-2019  skrll Use "segmap" for uvm_wait message in pmap_segtab_alloc
 1.9 18-Sep-2019  skrll s/pte/ptep/ in pmap_pte_process for consistency with other code. NFCI.
 1.8 18-Sep-2019  skrll Whitespace
 1.7 08-Mar-2019  msaitoh s/ the the / the /
 1.6 12-May-2017  skrll branches: 1.6.8; 1.6.12;
KASSERT -> KASSERTMSG
 1.5 12-May-2017  skrll Trailing whitespace
 1.4 23-Nov-2016  mrg branches: 1.4.6;
fix the start index generation in pmap_segtab_release() to
ensure it fits in the actual array. fixes N64 binaries from
triggering later panic. move the panic check itself into a
common function that is called from a couple of new places too.
 1.3 11-Jul-2016  matt branches: 1.3.2;
Changes so that MIPS can use the common pmap.
Change/augment the virtual cache alias callbacks.
 1.2 11-Jun-2015  matt Use PMAP_MAP_POOLPAGE instead of POOL_PHYSTOV since we use PMAP_UNMAP_POOLPAGE.
Use PMAP_ALLOC_POOLPAGE instead of pmap_md_alloc_poolpage.
Cleanup some panic messages.
 1.1 03-Oct-2012  christos branches: 1.1.2; 1.1.14; 1.1.16; 1.1.18; 1.1.20;
move from common/pmap/tlb -> uvm/pmap
 1.1.20.1 18-Jan-2017  skrll Sync with netbsd-5
 1.1.18.1 03-Dec-2016  martin Pull up following revision(s) (requested by mrg in ticket #1275):
sys/arch/mips/include/vmparam.h: revision 1.57
sys/uvm/pmap/pmap_segtab.c: revision 1.4
1TB is enough UVA for anyone... plus not all cpus can support more.
fix the start index generation in pmap_segtab_release() to
ensure it fits in the actual array. fixes N64 binaries from
triggering later panic. move the panic check itself into a
common function that is called from a couple of new places too.
 1.1.16.4 28-Aug-2017  skrll Sync with HEAD
 1.1.16.3 05-Dec-2016  skrll Sync with HEAD
 1.1.16.2 05-Oct-2016  skrll Sync with HEAD
 1.1.16.1 22-Sep-2015  skrll Sync with HEAD
 1.1.14.1 03-Dec-2016  martin Pull up following revision(s) (requested by mrg in ticket #1275):
sys/arch/mips/include/vmparam.h: revision 1.57
sys/uvm/pmap/pmap_segtab.c: revision 1.4
1TB is enough UVA for anyone... plus not all cpus can support more.
fix the start index generation in pmap_segtab_release() to
ensure it fits in the actual array. fixes N64 binaries from
triggering later panic. move the panic check itself into a
common function that is called from a couple of new places too.
 1.1.2.2 30-Oct-2012  yamt sync with head
 1.1.2.1 03-Oct-2012  yamt file pmap_segtab.c was added on branch yamt-pagecache on 2012-10-30 17:23:03 +0000
 1.3.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.4.6.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.6.12.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.6.12.1 10-Jun-2019  christos Sync with HEAD
 1.6.8.2 03-Dec-2017  jdolecek update from HEAD
 1.6.8.1 12-May-2017  jdolecek file pmap_segtab.c was added on branch tls-maxphys on 2017-12-03 11:39:23 +0000
 1.13.2.1 29-Feb-2020  ad Sync with head.
 1.26.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.5 13-Apr-2020  skrll Trailing whitespace
 1.4 18-Dec-2019  skrll branches: 1.4.6;
KNF
 1.3 11-Jul-2016  matt branches: 1.3.16; 1.3.20;
Changes so that MIPS can use the common pmap.
Change/augment the virtual cache alias callbacks.
 1.2 02-Jul-2013  matt branches: 1.2.8;
Split tlb related stuff into pmap_tlb.h so that can be used for ASID mgmt
for non-soft TLB pmaps.
 1.1 03-Oct-2012  christos branches: 1.1.2; 1.1.4;
move from common/pmap/tlb -> uvm/pmap
 1.1.4.1 28-Aug-2013  rmind sync with head
 1.1.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.2.2 30-Oct-2012  yamt sync with head
 1.1.2.1 03-Oct-2012  yamt file pmap_synci.c was added on branch yamt-pagecache on 2012-10-30 17:23:03 +0000
 1.2.8.1 05-Oct-2016  skrll Sync with HEAD
 1.3.20.2 21-Apr-2020  martin Sync with HEAD
 1.3.20.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.3.16.2 03-Dec-2017  jdolecek update from HEAD
 1.3.16.1 11-Jul-2016  jdolecek file pmap_synci.c was added on branch tls-maxphys on 2017-12-03 11:39:23 +0000
 1.4.6.1 20-Apr-2020  bouyer Sync with HEAD
 1.1 11-Jul-2016  matt branches: 1.1.4; 1.1.18;
Changes so that MIPS can use the common pmap.
Change/augment the virtual cache alias callbacks.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 11-Jul-2016  jdolecek file pmap_synci.h was added on branch tls-maxphys on 2017-12-03 11:39:23 +0000
 1.1.4.2 05-Oct-2016  skrll Sync with HEAD
 1.1.4.1 11-Jul-2016  skrll file pmap_synci.h was added on branch nick-nhusb on 2016-10-05 20:56:12 +0000
 1.63 02-Sep-2025  skrll Don't log anything when !DIAGNOSTIC in pmap_tlb_pai_check as it's a no-op.
 1.62 01-Jan-2024  skrll Appease KASSERTs for zero ASID CPUs (I mean harts)
 1.61 06-Oct-2023  skrll Support CPUs that might not have ASIDs in the common pmap.
 1.60 01-Aug-2023  skrll Improve debug
 1.59 12-Jun-2023  skrll Fix compile for non-MULTIPROCESSOR and PMAP_TLB_MAX > 1 builds
 1.58 12-Jun-2023  skrll Fixup UVMHIST builds
 1.57 22-Apr-2023  skrll KASSERT(kpreempt_disabled()) before accessing curcpu()
 1.56 19-Feb-2023  skrll Spaces to TABs. NFCI.
 1.55 07-Nov-2022  skrll Fix UVMHIST build
 1.54 26-Oct-2022  skrll MI PMAP hardware page table walker support.

This is based on code given to me by Matt Thomas a long time ago with
many updates and bugs fixes from me.
 1.53 20-Oct-2022  skrll Add a KASSERT to check that tlb_asid_t is a large enough type.
 1.52 04-Mar-2022  skrll Rmmove an incorrect KASSERT.
 1.51 02-Jan-2022  christos fix KASSERTMSG issue
 1.50 29-Dec-2021  skrll Remove duplicate KASSERT
 1.49 27-Oct-2021  simonb TAB police.
 1.48 27-Oct-2021  simonb When adjusting the max ASID count, check if ti->ti_asid_max == 0 as
well. This defaults to 0 for the non-PMAP_TLB_NUM_PIDS case, so would
skip the updated test.

Fix for port-pmax/56466 (which affects all MIPS).

ok srkll@
 1.47 08-Oct-2021  skrll Fix a logic botch to actually apply the ASID limit returned by
pmap_md_tlb_asid_max.
 1.46 02-Oct-2021  skrll Pass the pmap in tlb_set_asid for the benefit of aarch64.
 1.45 12-Sep-2021  skrll comment whitespace
 1.44 04-May-2021  skrll Always expose pmap_tlb_update_addr now that all current PMAP_HWPAGEWALKERs
(arm) users provide the required functions.
 1.43 01-May-2021  skrll Revert previous
 1.42 01-May-2021  skrll Expose pmap_tlb_update_addr to the PMAP_HWPAGEWALKER platforms
 1.41 24-Sep-2020  skrll branches: 1.41.6;
Whitespace
 1.40 22-Aug-2020  skrll Whitespace - line continutation alignment
 1.39 19-Aug-2020  skrll KNF. Add some whitespace to the TLBINV_MAP macro and tlb_invalidate_op
enum.
 1.38 19-Aug-2020  skrll Unwrap short line KASSERT
 1.37 19-Aug-2020  skrll Fix inverted logic test in pmap_tlb_shootdown_process for if the victim
is onproc.
 1.36 11-Aug-2020  skrll s/pmaphist/maphist/ for now
 1.35 11-Aug-2020  skrll More UVMHIST_LOG. Remove some commented output printfs.
 1.34 09-Aug-2020  skrll Don't kcpuset_clone every pmap_tlb_shootdown_bystanders. Instead allocate
a kcpuset_t per cpu_info and use that.
 1.33 14-Apr-2020  skrll Fix UVMHIST bulid
 1.32 12-Apr-2020  skrll Use UVMHIST_CALLARGS
 1.31 09-Apr-2020  skrll Make a comment less MIPS specific
 1.30 18-Dec-2019  skrll branches: 1.30.6;
KNF
 1.29 17-Dec-2019  skrll Fix a UVMHIST_LOG format
 1.28 25-Feb-2018  jdolecek branches: 1.28.4;
fix the DIAGNOSTIC function pmap_tlb_asid_count() to not expect
that TLBINFO_ASID_INUSE_P() returns just 0 or 1; the underlying
__BITMAP_ISSET() actually returns the matching bit nowadays, which
caused miscounting

fixes PR kern/53054 by Sevan Janiyan
 1.27 25-Feb-2018  jdolecek adjust KASSERT() triggered in PR port-cobalt/53054 to provide more info
 1.26 21-Feb-2018  jdolecek KERNEL_PID is > 0 on powerpc/ibm4xx, need to mask all bits <0,
KERNEL_PID> to avoid triggering KASSERT() checking allocated asid
is bigger than KERNEL_PID; adjust also TLBINFO_ASID_INITIAL_FREE()
accordingly

discussed with Nick
 1.25 19-Feb-2018  jdolecek convert to use actual __BITMAP_*() macros from <sys/bitops.h>, and make
it possible to override the ASID bitmap length; default to 256 ASIDs as before

XXX NFCI; compile tested only on evbpcc and evbmips, unfortunately didn't
find any combination of port using the MI pmap_tlb.c and working in QEMU
 1.24 19-Feb-2018  jdolecek a bit of DRY - add macro for initial free ASID count
 1.23 19-Feb-2018  jdolecek make it possible to not use the icache evcnts
 1.22 28-Oct-2017  pgoyette branches: 1.22.2;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.21 26-May-2017  skrll branches: 1.21.2;
Remove incorrect __diagused
 1.20 26-May-2017  skrll Use the define name PMAP_HWPAGEWALKER and not PMAP_TLB_HWPAGEWALKER
 1.19 09-Oct-2016  christos PR/51540: Henning Petersen: replace , with ;
 1.18 23-Jul-2016  matt Lock the tlbinfo if it wasn't when doing a pmap_tlb_pai_check
 1.17 14-Jul-2016  skrll branches: 1.17.2;
Use KERNEL_PID instead of 0
 1.16 14-Jul-2016  skrll Fix some comments.
 1.15 14-Jul-2016  skrll Trailing whitespace
 1.14 12-Jul-2016  skrll Fix typo for build check
 1.13 11-Jul-2016  matt Changes so that MIPS can use the common pmap.
Change/augment the virtual cache alias callbacks.
 1.12 11-Jun-2015  matt Don't call kcpuset_intersecting_p and then kcpuset_ffs_intersecting since
that the last will tell use what we need to know.
 1.11 18-Apr-2015  joerg pmap_tlb_intersecting_active_p is not used in some combinations of
platform options as seen by recent ARM changes.
 1.10 29-Oct-2014  skrll branches: 1.10.2;
s/0/KERNEL_PID/ for correctness
 1.9 18-Oct-2014  skrll Minor comment update.
 1.8 03-Apr-2014  matt branches: 1.8.4;
Change cpu_tlb_info definition based on PMAP_TLB_MAX instead of MULTIPROCESSOR
 1.7 03-Apr-2014  matt Compare ASIDs, not pmaps.
 1.6 03-Apr-2014  matt Make this compile on booke again.
 1.5 30-Mar-2014  matt Allow this to handle H/W tlbs. Some ARM allow for a cheap way to flush all
entries using an ASID from the TLB. Add support for taking advantage of it.
Most ARMs don't have an easy way to find out what's in the TLB so make
record_asids can just say all ASIDs are in use. Fix some off by 1 errors.
 1.4 18-Mar-2014  riastradh Merge riastradh-drm2 to HEAD.
 1.3 17-Jul-2013  matt Make this kcpuset_t instead of the private __cpuset_t
Add improvements for single TLB implementation (PPC, ARM).
 1.2 02-Jul-2013  matt branches: 1.2.2;
Split tlb related stuff into pmap_tlb.h so that can be used for ASID mgmt
for non-soft TLB pmaps.
 1.1 03-Oct-2012  christos branches: 1.1.2; 1.1.4;
move from common/pmap/tlb -> uvm/pmap
 1.1.4.2 18-May-2014  rmind sync with head
 1.1.4.1 28-Aug-2013  rmind sync with head
 1.1.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.2.2 30-Oct-2012  yamt sync with head
 1.1.2.1 03-Oct-2012  yamt file pmap_tlb.c was added on branch yamt-pagecache on 2012-10-30 17:23:03 +0000
 1.2.2.1 23-Jul-2013  riastradh sync with HEAD
 1.8.4.1 09-Nov-2014  martin Pull up following revision(s) (requested by skrll in ticket #188):
sys/arch/arm/include/arm32/pmap.h: revision 1.136
sys/arch/arm/include/armreg.h: revision 1.100
sys/arch/arm/cortex/gic.c: revision 1.11
sys/arch/arm/arm32/db_interface.c: revision 1.54
sys/arch/arm/include/armreg.h: revision 1.101
sys/arch/arm/cortex/gic.c: revision 1.12
sys/arch/arm/arm32/arm32_machdep.c: revision 1.107
sys/arch/arm/arm/cpufunc_asm_armv7.S: revision 1.19
sys/arch/arm/cortex/a9_mpsubr.S: revision 1.20
sys/arch/evbarm/conf/BPI: revision 1.5
sys/arch/arm/cortex/a9_mpsubr.S: revision 1.21
sys/arch/arm/arm32/pmap.c: revision 1.306
sys/arch/arm/arm32/db_machdep.c: revision 1.22
sys/arch/arm/arm32/arm32_tlb.c: revision 1.3
sys/arch/arm/arm/undefined.c: revision 1.55
sys/arch/arm/cortex/a9_mpsubr.S: revision 1.22
sys/arch/arm/arm32/pmap.c: revision 1.307
sys/arch/arm/arm32/arm32_tlb.c: revision 1.4
sys/arch/arm/cortex/a9_mpsubr.S: revision 1.23
sys/arch/arm/arm32/arm32_tlb.c: revision 1.5
sys/arch/evbarm/conf/BPI: revision 1.8
sys/arch/arm/cortex/a9_mpsubr.S: revision 1.24
sys/arch/arm/arm32/arm32_tlb.c: revision 1.6
sys/arch/arm/arm32/arm32_tlb.c: revision 1.7
sys/arch/evbarm/conf/CUBIETRUCK: revision 1.5
sys/arch/arm/pic/pic.c: revision 1.23
sys/arch/arm/pic/pic.c: revision 1.24
sys/arch/arm/pic/picvar.h: revision 1.11
sys/arch/arm/arm/cpufunc_asm_armv7.S: revision 1.20
sys/arch/arm/mainbus/cpu_mainbus.c: revision 1.16
sys/arch/arm/arm32/pmap.c: revision 1.298
sys/arch/arm/arm/cpufunc_asm_arm11.S: revision 1.17
sys/arch/arm/arm/cpufunc_asm_pj4b.S: revision 1.5
sys/arch/arm/arm32/pmap.c: revision 1.310
sys/arch/arm/arm32/pmap.c: revision 1.311
sys/arch/arm/arm32/arm32_kvminit.c: revision 1.32
sys/arch/arm/cortex/a9_mpsubr.S: revision 1.19
sys/arch/arm/arm32/arm32_boot.c: revision 1.10
sys/arch/arm/arm/ast.c: revision 1.25
sys/arch/arm/include/armreg.h: revision 1.98
sys/uvm/pmap/pmap_tlb.c: revision 1.10
sys/arch/arm/arm32/arm32_boot.c: revision 1.8
sys/arch/arm/arm32/arm32_boot.c: revision 1.9
sys/arch/arm/arm/arm_machdep.c: revision 1.43
Various ARM MP fixes.
 1.10.2.5 28-Aug-2017  skrll Sync with HEAD
 1.10.2.4 05-Dec-2016  skrll Sync with HEAD
 1.10.2.3 05-Oct-2016  skrll Sync with HEAD
 1.10.2.2 22-Sep-2015  skrll Sync with HEAD
 1.10.2.1 06-Jun-2015  skrll Sync with HEAD
 1.17.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.17.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.21.2.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.22.2.2 03-Dec-2017  jdolecek update from HEAD
 1.22.2.1 28-Oct-2017  jdolecek file pmap_tlb.c was added on branch tls-maxphys on 2017-12-03 11:39:23 +0000
 1.28.4.3 21-Apr-2020  martin Sync with HEAD
 1.28.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.28.4.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.30.6.1 20-Apr-2020  bouyer Sync with HEAD
 1.41.6.1 13-May-2021  thorpej Sync with HEAD.
 1.17 06-Oct-2023  skrll Support CPUs that might not have ASIDs in the common pmap.
 1.16 26-Oct-2022  skrll MI PMAP hardware page table walker support.

This is based on code given to me by Matt Thomas a long time ago with
many updates and bugs fixes from me.
 1.15 19-Aug-2020  skrll KNF. Add some whitespace to the TLBINV_MAP macro and tlb_invalidate_op
enum.
 1.14 01-Aug-2020  skrll Provide a TLBINFO_OWNED
 1.13 19-Feb-2018  jdolecek convert to use actual __BITMAP_*() macros from <sys/bitops.h>, and make
it possible to override the ASID bitmap length; default to 256 ASIDs as before

XXX NFCI; compile tested only on evbpcc and evbmips, unfortunately didn't
find any combination of port using the MI pmap_tlb.c and working in QEMU
 1.12 19-Feb-2018  jdolecek make it possible to not use the icache evcnts
 1.11 24-Jun-2017  skrll Multiple inclusion protection define consistency
 1.10 26-May-2017  skrll Whitespace
 1.9 11-Jul-2016  matt Changes so that MIPS can use the common pmap.
Change/augment the virtual cache alias callbacks.
 1.8 02-Apr-2015  matt include <sys/evcnt.h>
 1.7 05-Jan-2015  nonaka Use PMAP_TLB_MAX instead of MAXCPUS.
 1.6 03-Apr-2014  matt branches: 1.6.4; 1.6.8; 1.6.10;
Change cpu_tlb_info definition based on PMAP_TLB_MAX instead of MULTIPROCESSOR
 1.5 30-Mar-2014  matt Allow this to handle H/W tlbs. Some ARM allow for a cheap way to flush all
entries using an ASID from the TLB. Add support for taking advantage of it.
Most ARMs don't have an easy way to find out what's in the TLB so make
record_asids can just say all ASIDs are in use. Fix some off by 1 errors.
 1.4 18-Mar-2014  riastradh Merge riastradh-drm2 to HEAD.
 1.3 22-Jul-2013  matt branches: 1.3.2;
If not MULTIPROCESSOR, just make cpu_tlb_info(ci) return &pmap_tlb0_info
 1.2 17-Jul-2013  matt Make this kcpuset_t instead of the private __cpuset_t
Add improvements for single TLB implementation (PPC, ARM).
 1.1 02-Jul-2013  matt branches: 1.1.2;
Split tlb related stuff into pmap_tlb.h so that can be used for ASID mgmt
for non-soft TLB pmaps.
 1.1.2.1 23-Jul-2013  riastradh sync with HEAD
 1.3.2.3 18-May-2014  rmind sync with head
 1.3.2.2 28-Aug-2013  rmind sync with head
 1.3.2.1 22-Jul-2013  rmind file pmap_tlb.h was added on branch rmind-smpnet on 2013-08-28 23:59:38 +0000
 1.6.10.3 28-Aug-2017  skrll Sync with HEAD
 1.6.10.2 05-Oct-2016  skrll Sync with HEAD
 1.6.10.1 06-Apr-2015  skrll Sync with HEAD
 1.6.8.3 03-Dec-2017  jdolecek update from HEAD
 1.6.8.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.8.1 03-Apr-2014  tls file pmap_tlb.h was added on branch tls-maxphys on 2014-08-20 00:04:45 +0000
 1.6.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.6.4.1 03-Apr-2014  yamt file pmap_tlb.h was added on branch yamt-pagecache on 2014-05-22 11:41:19 +0000
 1.5 02-Oct-2021  skrll Pass the pmap in tlb_set_asid for the benefit of aarch64.
 1.4 24-Jun-2017  skrll branches: 1.4.4;
Multiple inclusion protection define consistency
 1.3 11-Jul-2016  matt Changes so that MIPS can use the common pmap.
Change/augment the virtual cache alias callbacks.
 1.2 21-Sep-2015  matt Update multiple inclusion macro
 1.1 03-Oct-2012  christos branches: 1.1.2; 1.1.16;
move from common/pmap/tlb -> uvm/pmap
 1.1.16.3 28-Aug-2017  skrll Sync with HEAD
 1.1.16.2 05-Oct-2016  skrll Sync with HEAD
 1.1.16.1 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.2 30-Oct-2012  yamt sync with head
 1.1.2.1 03-Oct-2012  yamt file tlb.h was added on branch yamt-pagecache on 2012-10-30 17:23:03 +0000
 1.4.4.2 03-Dec-2017  jdolecek update from HEAD
 1.4.4.1 24-Jun-2017  jdolecek file tlb.h was added on branch tls-maxphys on 2017-12-03 11:39:23 +0000
 1.17 20-Dec-2020  skrll Support __HAVE_PMAP_PV_TRACK in sys/uvm/pmap based pmaps (aka common pmap)
 1.16 30-Dec-2019  ad branches: 1.16.8;
pg->phys_addr -> VM_PAGE_TO_PHYS().
 1.15 20-Oct-2019  skrll Define and use VM_PAGEMD_PVLIST_EMPTY_P
 1.14 20-Oct-2019  skrll Whitespace
 1.13 20-Oct-2019  skrll Re-order _P() macros to match bit definitions. NFCI
 1.12 12-Jul-2019  skrll Provide and use PV_ISKENTER_P. NFCI.
 1.11 19-Jun-2019  christos use __nothing
 1.10 19-Jun-2019  skrll Once more short line to unwrap
 1.9 19-Jun-2019  skrll Unwrap short lines. NFCI.
 1.8 19-Apr-2018  christos branches: 1.8.2;
s/static inline/static __inline/g for consistency.
 1.7 24-Jun-2017  skrll branches: 1.7.4; 1.7.6;
Use __BIT(0) for PV_KENTER. NFC.
 1.6 24-Jun-2017  skrll Whitespace - comment alignment.
 1.5 24-Jun-2017  skrll Multiple inclusion protection define consistency
 1.4 07-Jun-2017  skrll Use __BIT(). No functional change.
 1.3 11-Jul-2016  matt Changes so that MIPS can use the common pmap.
Change/augment the virtual cache alias callbacks.
 1.2 04-Mar-2014  matt branches: 1.2.6;
use _KERNEL_OPT around #include
 1.1 03-Oct-2012  christos branches: 1.1.2; 1.1.4;
move from common/pmap/tlb -> uvm/pmap
 1.1.4.1 18-May-2014  rmind sync with head
 1.1.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.2.2 30-Oct-2012  yamt sync with head
 1.1.2.1 03-Oct-2012  yamt file vmpagemd.h was added on branch yamt-pagecache on 2012-10-30 17:23:03 +0000
 1.2.6.2 28-Aug-2017  skrll Sync with HEAD
 1.2.6.1 05-Oct-2016  skrll Sync with HEAD
 1.7.6.1 22-Apr-2018  pgoyette Sync with HEAD
 1.7.4.2 03-Dec-2017  jdolecek update from HEAD
 1.7.4.1 24-Jun-2017  jdolecek file vmpagemd.h was added on branch tls-maxphys on 2017-12-03 11:39:23 +0000
 1.8.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.8.2.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.16.8.1 03-Jan-2021  thorpej Sync w/ HEAD.

RSS XML Feed