Home | History | Annotate | Download | only in uvm
History log of /src/sys/uvm/uvm_fault.c
RevisionDateAuthorComments
 1.237  15-Mar-2024  andvar Fix !VMSWAP build:
Added __unused for few local variables, which are used in VMSWAP block only.
Adjust !VMSWAP uvm_swap_stats() definition to make it build with compat code.
Copied "int (*uvm_swap_stats50)(...)" definition from uvm_swap to uvm_swapstub
to avoid missing uvm_swap_stats50 reference on linking.

Fixes INSTALL_CPMBR1400, INSTALL_ZYXELKX evbmips kernel configs as a result.

Reviewed by simon and phone in IRC (thanks).
 1.236  19-Sep-2023  ad Don't needlessly bump a couple of fault counters if upgrading the rwlock
failed.
 1.235  01-Sep-2023  andvar s/unnmapped/unmapped/ in comment.
 1.234  13-Aug-2023  chs uvm: prevent TLB invalidation races during COW resolution

When a thread takes a page fault which results in COW resolution,
other threads in the same process can be concurrently accessing that
same mapping on other CPUs. When the faulting thread updates the pmap
entry at the end of COW processing, the resulting TLB invalidations to
other CPUs are not done atomically, so another thread can write to the
new writable page and then a third thread might still read from the
old read-only page, resulting in inconsistent views of the page by the
latter two threads. Fix this by removing the pmap entry entirely for
the original page before we install the new pmap entry for the new
page, so that the new page can only be modified after the old page is
no longer accessible.

This fixes PR 56535 as well as the netbsd versions of problems
described in various bug trackers:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225584
https://reviews.freebsd.org/D14347
https://github.com/golang/go/issues/34988
 1.233  17-Jul-2023  riastradh uvm(9): One rndsource for faults -- not one per CPU.

All relevant state is per-CPU anyway; the only substantive difference
this makes is how many entries appear in `rndctl -l' output and what
they are called -- formerly the somewhat confusing `cpuN', meaning
`page faults on cpuN', and now just `uvmfault'. I don't think
there's any real value in being able to enable or disable measurement
or counting of page faults on one CPU vs others, so although this
could be a minor compatibility change, it's hard to imagine it
matters much.

XXX kernel ABI change in struct cpu_info
 1.232  09-Apr-2023  riastradh uvm(9): KASSERT(A && B) -> KASSERT(A); KASSERT(B)
 1.231  26-Oct-2022  riastradh branches: 1.231.2;
sys/kernel.h: New home for extern start_init_exec.
 1.230  03-Jun-2022  dholland typo in comment
 1.229  05-Dec-2021  msaitoh s/recusive/recursive/ in comment.
 1.228  09-Jul-2020  skrll Consistently use UVMHIST(__func__)

Convert UVMHIST_{CALLED,LOG} into UVMHIST_CALLARGS
 1.227  17-May-2020  ad Start trying to reduce cache misses on vm_page during fault processing.

- Make PGO_LOCKED getpages imply PGO_NOBUSY and remove the latter. Mark
pages busy only when there's actually I/O to do.

- When doing COW on a uvm_object, don't mess with neighbouring pages. In
all likelyhood they're already entered.

- Don't mess with neighbouring VAs that have existing mappings as replacing
those mappings with same can be quite costly.

- Don't enqueue pages for neighbour faults unless not enqueued already, and
don't activate centre pages unless uvmpdpol says its useful.

Also:

- Make PGO_LOCKED getpages on UAOs work more like vnodes: do gang lookup in
the radix tree, and don't allocate new pages.

- Fix many assertion failures around faults/loans with tmpfs.
 1.226  15-May-2020  ad Reported-by: syzbot+3e3c7cfa8093f8de047e@syzkaller.appspotmail.com

Comment out an assertion that's now bogus and add a comment.
 1.225  13-Apr-2020  ad uvm_fault_check(): if MADV_SEQUENTIAL, change lower lock type to RW_WRITER
in case many threads are concurrently doing "sequential" access, to avoid
excessive mixing of read/write lock holds.
 1.224  23-Mar-2020  skrll branches: 1.224.2;
Fix UVMHIST build
 1.223  23-Mar-2020  skrll Trailing whitespace
 1.222  22-Mar-2020  ad Process concurrent page faults on individual uvm_objects / vm_amaps in
parallel, where the relevant pages are already in-core. Proposed on
tech-kern.

Temporarily disabled on MP architectures with __HAVE_UNLOCKED_PMAP until
adjustments are made to their pmaps.
 1.221  20-Mar-2020  ad Go back to freeing struct vm_anon one by one. There may have been an
advantage circa ~2008 but there isn't now.
 1.220  20-Mar-2020  ad uvm_fault_upper_lookup(): don't call pmap_extract() and pmap_update() more
often than needed.
 1.219  17-Mar-2020  ad Tweak the March 14th change to make page waits interlocked by pg->interlock.
Remove unneeded changes and only deal with the PQ_WANTED flag, to exclude
possible bugs.
 1.218  14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.217  24-Feb-2020  rin 0x%#x --> %#x for non-external codes.
Also, stop mixing up 0x%x and %#x in single files as far as possible.
 1.216  23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.215  15-Jan-2020  ad Merge from yamt-pagecache (after much testing):

- Reduce unnecessary page scan in putpages esp. when an object has a ton of
pages cached but only a few of them are dirty.

- Reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
 1.214  31-Dec-2019  ad branches: 1.214.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.213  16-Dec-2019  ad - Extend the per-CPU counters matt@ did to include all of the hot counters
in UVM, excluding uvmexp.free, which needs special treatment and will be
done with a separate commit. Cuts system time for a build by 20-25% on
a 48 CPU machine w/DIAGNOSTIC.

- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
 1.212  13-Dec-2019  ad Break the global uvm_pageqlock into a per-page identity lock and a private
lock for use of the pagedaemon policy code. Discussed on tech-kern.

PR kern/54209: NetBSD 8 large memory performance extremely low
PR kern/54210: NetBSD-8 processes presumably not exiting
PR kern/54727: writing a large file causes unreasonable system behaviour
 1.211  01-Dec-2019  ad Deactivate pages in batch instead of acquiring uvm_pageqlock repeatedly.
 1.210  01-Dec-2019  martin Add missing <sys/atomic.h> include
 1.209  01-Dec-2019  maxv Use atomic_{load,store}_relaxed() on global counters.
 1.208  10-Nov-2019  chs in uvm_fault_lower_io(), fetch all the map entry values that we need
before we unlock everything.

Reported-by: syzbot+bb6f0092562222b489a3@syzkaller.appspotmail.com
 1.207  05-Aug-2019  chs fix two bugs reported in
https://syzkaller.appspot.com/bug?id=8840dce484094a926e1ec388ffb83acb2fa291c9

- in uvm_fault_check(), if the map entry is wired, handle the fault the same way
that we would handle UVM_FAULT_WIRE. faulting on wired mappings is valid
if the mapped object was truncated and then later grown again.

- in uvm_fault_unwire_locked(), we must hold the locks for the vm_map_entry
while calling pmap_extract() in order to avoid races with the mapped object
being truncated while we are unwiring it.

Reported-by: syzbot+2e0ae2fc35ab7301c7b8@syzkaller.appspotmail.com
 1.206  28-May-2019  msaitoh branches: 1.206.2;
s/recieve/receive/
 1.205  21-Apr-2019  chs If a pager fault method returns ENOMEM but some memory appears to be reclaimable,
wake up the pagedaemon and retry the fault. This fixes the problems with Xorg
being killed with an "out of swap" message due to a transient memory shortage.
 1.204  08-May-2018  christos branches: 1.204.2;
don't store the rssmax in the lwp rusage, it is a per proc property. Instead
utilize an unused field in the vmspace struct to store it. Also conditionalize
on platforms that have pmap statistics available.
 1.203  07-May-2018  christos update maxrss (used to always be 0). Patterned after the OpenBSD changes.
 1.202  20-Nov-2017  chs branches: 1.202.2;
In uvm_fault_upper_enter(), if pmap_enter(PMAP_CANFAIL) fails, assert that
the pmap did not leave around a now-stale pmap mapping for an old page.
If such a pmap mapping still existed after we unlocked the vm_map,
the UVM code would not know later that it would need to lock the
lower layer object while calling the pmap to remove or replace that
stale pmap mapping. See PR 52706 for further details.
 1.201  28-Oct-2017  pgoyette Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.200  09-Jul-2017  christos PR/52384: make uvm_fault_check() return EFAULT not EACCES, like our man pages
(but not OpenGroup which does not document EFAULT for read/write, and only
documents EACCES for sockets) say for read/write.
 1.199  20-Mar-2017  skrll branches: 1.199.6;
Ensure we pass the prot in flags to pmap_enter when creating a wired
mapping
 1.198  19-Mar-2017  riastradh __diagused police
 1.197  22-Jun-2015  matt branches: 1.197.2; 1.197.4;
Use %p, %#xl etc. for pointers and addresses.
 1.196  10-Aug-2014  tls branches: 1.196.4;
Merge tls-earlyentropy branch into HEAD.
 1.195  15-Sep-2013  martin branches: 1.195.2;
Mark a variable as potentially unused
 1.194  19-Feb-2012  rmind branches: 1.194.2; 1.194.4;
Remove VM_MAP_INTRSAFE and related code. Not used since the "kmem changes".
 1.193  02-Feb-2012  tls Entropy-pool implementation move and cleanup.

1) Move core entropy-pool code and source/sink/sample management code
to sys/kern from sys/dev.

2) Remove use of NRND as test for presence of entropy-pool code throughout
source tree.

3) Remove use of RND_ENABLED in device drivers as microoptimization to
avoid expensive operations on disabled entropy sources; make the
rnd_add calls do this directly so all callers benefit.

4) Fix bug in recent rnd_add_data()/rnd_add_uint32() changes that might
have lead to slight entropy overestimation for some sources.

5) Add new source types for environmental sensors, power sensors, VM
system events, and skew between clocks, with a sample implementation
for each.

ok releng to go in before the branch due to the difficulty of later
pullup (widespread #ifdef removal and moved files). Tested with release
builds on amd64 and evbarm and live testing on amd64.
 1.192  27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.191  28-Nov-2011  yamt branches: 1.191.2;
comments
 1.190  06-Aug-2011  rmind branches: 1.190.2;
- Rework uvm_anfree() into uvm_anon_freelst(), which always drops the lock.
- Free anons in uvm_anon_freelst() without lock held.
- Mechanic sync to unused loaning code.
 1.189  05-Jul-2011  yamt reduce the number of atomic ops in common cases. it's exceptional for
anons to remain longer than amap.
 1.188  24-Jun-2011  rmind Fix uvmplock regression - a lock against oneself case in amap_swap_off().
Happens since amap is NULL in uvmfault_anonget(), so uvmfault_unlockall()
keeps anon locked, when it should unlock it.
 1.187  23-Jun-2011  rmind uvmfault_anonget: clean-up, improve some comments, misc.
 1.186  12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.185  21-May-2011  tsutsui branches: 1.185.2;
No need to pass UVM_FLAG_COLORMATCH to uvm_pagealloc()
if no valid vaddr is specified.
 1.184  23-Apr-2011  rmind Replace "malloc" in comments, remove unnecessary header inclusions.
 1.183  08-Apr-2011  yamt - ensure that the promoted page is on the queue even when later pmap_enter
failed.
- don't activate a page twice.
- remove an argument which is used only for an assertion.
- assertions and comments.
 1.182  10-Feb-2011  skrll Spell uvm_fault_lower_neighbor correctly in UVMHIST_FUNC by using
__func__
 1.181  02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.180  06-Jan-2011  enami branches: 1.180.2; 1.180.4;
Fix bugs introduced by previous commit; allocated page needs to be bound
with the anon, and uvmfault_anonget may be called with ufi NULL.
 1.179  04-Jan-2011  matt Add better color matching selecting free pages. KM pages will now allocated
so that VA and PA have the same color. On a page fault, choose a physical
page that has the same color as the virtual address.

When allocating kernel memory pages, allow the MD to specify a preferred
VM_FREELIST from which to choose pages. For machines with large amounts
of memory (> 4GB), all kernel memory to come from <4GB to reduce the amount
of bounce buffering needed with 32bit DMA devices.
 1.178  20-Dec-2010  matt Move counting of faults, traps, intrs, soft[intr]s, syscalls, and nswtch
from uvmexp to per-cpu cpu_data and move them to 64bits. Remove unneeded
includes of <uvm/uvm_extern.h> and/or <uvm/uvm.h>.
 1.177  17-Dec-2010  yamt cosmetics. no functional changes.
- constify
- wrap long lines
- assertions
- comments
 1.176  15-Dec-2010  pooka Remove duplicate asserts from when uvm_fault_lower1() was merged
into uvm_fault_lower() (the duplicates were there already before,
just in different functions).

reported by Alexander Nasonov on tech-kern
 1.175  22-Jun-2010  rmind Keep the lock around pmap_update() where required. While fixing this
in ubc_fault(), rework logic to "remember" the last object of page and
reduce locking overhead, since in common case pages belong to one and
the same UVM object (but not always, therefore add a comment).

Unlocks before pmap_update(), on removal of mappings, might cause TLB
coherency issues, since on architectures like x86 and mips64 invalidation
IPIs are deferred to pmap_update(). Hence, VA space might be globally
visible before IPIs are sent or while they are still in-flight.

OK ad@.
 1.174  28-May-2010  rmind uvm_fault_{upper,lower}_done: move drop-swap outside the page-queues lock.
Assert for object lock being held (or ref count 0) in uao_set_swslot().
 1.173  24-Feb-2010  uebayasi branches: 1.173.2;
Merge more indirect functions. Some comments.
 1.172  24-Feb-2010  uebayasi uvm_fault_upper_lookup, uvm_fault_upper_neighbor: There is no point to call
pmap_update() without calling pmap_enter().

(Probably calling only once after loop (as done in uvm_fault_lower_lookup())
is enough. If done so, other threads see entered neighbor pages as reflected
a little latter.)
 1.171  24-Feb-2010  uebayasi Minor clean up.
 1.170  24-Feb-2010  uebayasi Revert a thinko.
 1.169  24-Feb-2010  uebayasi Slightly clean up uvm_fault() code path after pmap_enter(). Now tasks
needed for page cache are concentrated in own functions (uvm_fault_*_done()).
 1.168  24-Feb-2010  uebayasi Record if "promote" is done in UVMHIST. Do it for "upper" fault too.
 1.167  24-Feb-2010  uebayasi Merge some indirect "lower" fault handlers back. Prompted by rmind@.
 1.166  08-Feb-2010  mlelstv branches: 1.166.2;
pgo_get needs the page array to be initialized.
 1.165  08-Feb-2010  mlelstv Move assertion to make check more clear.
 1.164  07-Feb-2010  mlelstv Make UVMHIST build again.
 1.163  05-Feb-2010  uebayasi Cosmetic. Shorten some long names.
 1.162  05-Feb-2010  uebayasi Fix !DIAGNOSTIC build. Reported by Geoff Wing.
 1.161  04-Feb-2010  uebayasi Reduce diff between upper/lower neighbor handlers.
 1.160  04-Feb-2010  uebayasi Merge "obfuscating layers" for readability. Inline some functions.
Requested by rmind@.
 1.159  04-Feb-2010  uebayasi Move uvm_fault_* static func decls in one place.
 1.158  03-Feb-2010  uebayasi uvm_fault_lower_generic_io: Reduce diff from uvm_loanuobj().
 1.157  03-Feb-2010  uebayasi uvm_fault_lower_generic_io: One missing mutex_exit(vmobjlock). Found while
comparing this function with uvm_loanuobj(). (Part of) these should be
merged.
 1.156  02-Feb-2010  uebayasi uobj->pgops->pgo_get doing PGO_SYNCIO returns a uobjpage whose uobj backpointer
refers to another "uobj" used to call pgo_get. Revert the wrong assertion
I made. My bad.

(This and pgo_get's possible ERESTART return value check is the only 2 behavioral
changes I made.)

Reported by drochner@, thanks.
 1.155  02-Feb-2010  uebayasi Don't pass an unnecessary reference to uvm_loanbreak_anon().

Requested by rmind@.
 1.154  02-Feb-2010  uebayasi Be consistent to decide if PMAP_WIRED or not.
 1.153  02-Feb-2010  uebayasi Move A->K loan break code to uvm_loan.c.
 1.152  02-Feb-2010  uebayasi Indent.
 1.151  02-Feb-2010  uebayasi uvm_fault: Split "neighbor" fault and loan handling into functions.
 1.150  02-Feb-2010  uebayasi Sort struct uvm_faultctx members for better alignment.
 1.149  01-Feb-2010  uebayasi Indent.
 1.148  01-Feb-2010  uebayasi More split.
 1.147  01-Feb-2010  uebayasi Fix build without DIAGNOSTIC.
 1.146  01-Feb-2010  uebayasi uvm_fault: Clarify when to wire what.
 1.145  01-Feb-2010  uebayasi uvm_fault_upper_lookup: This is totally my personal preference, but can't help
adding one goto to reduce one indent.
 1.144  01-Feb-2010  uebayasi uvm_fault:
- Lower fault routines don't care the vm_anon array found in upper lookup.
Don't pass the pointer down.
- The flag "shadowed" is known when we lookup upper layer. Don't need to
keep in the fault context struct.
 1.143  01-Feb-2010  uebayasi Indent.
 1.142  01-Feb-2010  uebayasi Rewrite uvm_fault() loop using while () than goto.
 1.141  01-Feb-2010  uebayasi Split uvm_fault() into 2 more functions, uvm_fault_check() and
uvm_fault_upper_lookup(). Omit unnecessary arguments passed around.
 1.140  01-Feb-2010  uebayasi uvm_fault: Pack variables shared during fault / re-fault into a struct named
uvm_faultctx. Unfortunately ~all of those values are overriden in various
ways. Constification doesn't help much...
 1.139  01-Feb-2010  uebayasi ERESTART is already negative. Give up negating error values to not override
the original values. Pointed out by rmind@, thanks.

In the lower fault case, if (*pgo_get)() can return ERESTART and we should
re-fault for that remains a question. The original code just returned the
error, so keep that behaviour for now. In case (*pgo_get)() really returns
ERESTART, pass EIO to tell the uvm_fault caller that (*pgo_get)() failed.

(As far as I grep callers don't check if the return value is ERESTART or not.
So assuming (*pgo_get)() never returns ERESTART should be a safe bet.)
 1.138  31-Jan-2010  uebayasi Ax uvm_fault_internal() & break it into functions. "Upper" fault and "lower"
fault routines are separated now.
 1.137  31-Jan-2010  uebayasi uvm_fault_internal:

Move local variables around to isolate contexts. Note that remaining variables
are global in that function, and some hold state across re-fault.

Slilently clean-up the "eoff" mess.

(Superfluous braces will go once things settle down.)
 1.136  31-Jan-2010  uebayasi Indent.
 1.135  31-Jan-2010  uebayasi uvm_fault_internal: In lower fault handling case, put another goto to clarify
that we don't care lower neighboring pages for the zero-fill object.
 1.134  31-Jan-2010  uebayasi uvm_fault_internal: Skip another long code segment (lower "neighbor" fault)
by a goto.
 1.133  31-Jan-2010  uebayasi uvm_fault_internal: Put a goto label "Case1" as well as "Case2". Clarify
that if the faulting page is shadowed, we don't care the lower layer at all.
 1.132  31-Jan-2010  uebayasi Correct previous; fix a miscalculation of offset-into-entry in MADV_SEQUENTIAL
case. Pointed out by pooka@.
 1.131  30-Jan-2010  uebayasi Calculate the offset from vm_map_entry's start to vm_page array's start once.
 1.130  24-Jan-2010  uebayasi Clean up an internal flag usage. No functional changes.
 1.129  17-Dec-2009  rmind Replace few USER_TO_UAREA/UAREA_TO_USER uses, reduce sys/user.h inclusions.
 1.128  05-Dec-2009  pooka Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.
 1.127  01-Nov-2009  uebayasi Consistently call amap / uobj layers as upper / lower, because UVM has only
those two layers by design. Approved by Chuck Cranor some time ago.
 1.126  20-Dec-2008  ad Move a couple of calls to pmap_update().
 1.125  04-Jul-2008  ad branches: 1.125.4; 1.125.6;
Update a comment.
 1.124  27-Mar-2008  ad branches: 1.124.4; 1.124.6; 1.124.8;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.123  18-Jan-2008  yamt branches: 1.123.6;
push pmap_clear_reference calls into pdpolicy code, where reference bits
actually matter.
 1.122  02-Jan-2008  ad Merge vmlocking2 to head.
 1.121  11-Oct-2007  ad branches: 1.121.4; 1.121.6; 1.121.10;
Remove LOCK_ASSERT(!simple_lock_held(&foo));
 1.120  21-Jul-2007  ad branches: 1.120.4; 1.120.6; 1.120.8; 1.120.10;
Merge unobtrusive locking changes from the vmlocking branch.
 1.119  22-Feb-2007  thorpej branches: 1.119.4; 1.119.12;
TRUE -> true, FALSE -> false
 1.118  21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.117  15-Dec-2006  yamt branches: 1.117.2;
put ->K loaned pages on the page queue, so that page loaning doesn't
disturb pagedaemon/pdpolicy.
 1.116  01-Dec-2006  yamt uvm_fault: fix an assertion. PR/35134 from Christos Zoulas.
it can be triggered by minherit as well.
 1.115  28-Nov-2006  yamt uvm_fault: unwrap a short line.
 1.114  12-Oct-2006  yamt move some knowledge about vnode into uvm_vnode.c.
 1.113  03-Oct-2006  christos Coverity CID 3170,3171: Add KASSERT.
 1.112  15-Sep-2006  yamt branches: 1.112.2;
merge yamt-pdpolicy branch.
- separate page replacement policy from the rest of kernel
- implement an alternative replacement policy
 1.111  11-Apr-2006  yamt branches: 1.111.8;
add assertions.
 1.110  15-Mar-2006  drochner branches: 1.110.2;
-clean up the interface to uvm_fault: the "fault type" didn't serve
any purpose (done by a macro, so we don't save any cycles for now)
-kill vm_fault_t; it is not needed for real faults, and for simulated
faults (wiring) it can be replaced by UVM internal flags
-remove <uvm/uvm_fault.h> from uvm_extern.h again
 1.109  22-Feb-2006  drochner branches: 1.109.2; 1.109.4;
kill the "fault_type" argument to pager's pgo_fault() methods
it is never used
(and using it would comprise an abstraction violation imho)
 1.108  15-Feb-2006  yamt - amap_copy: take a "flags" argument instead of booleans.
- add AMAP_COPY_NOMERGE flag, and use it for uvm_map_extract.
PR/32806 from Julio M. Merino Vidal.
 1.107  31-Jan-2006  yamt branches: 1.107.2; 1.107.4;
handle "strange" filesystems like layered filesystems and tmpfs,
where pgo_get returns pages which don't belong to the uobj.
also fix an XXX in uvm_loananon and lock-unlock mismatch in uvm_loanuobj.

PR/28372, PR/32665 (Alan Barrett).
 1.106  31-Jan-2006  yamt re-apply uvm_fault.c 1.104. fixes will follow.
 1.105  30-Jan-2006  yamt revert uvm_fault.c 1.104 for now. see PR/28372, PR/32665.
 1.104  21-Jan-2006  yamt - uvm_fault: move a common code of 1B and 2B to a new function.
don't attempt to allocate anons with kernel_map locked. PR/32543.
- amap_copy: add an assertion.
 1.103  24-Dec-2005  perry branches: 1.103.2;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.102  11-Dec-2005  christos merge ktrace-lwp.
 1.101  13-Sep-2005  yamt wrap swap related code by #ifdef VMSWAP. always #define VMSWAP for now.
 1.100  31-Jul-2005  yamt revert "defflag VMSWAP" changes for now.
there seems to be far more people who don't want to edit
their kernel config files than i thought.
 1.99  30-Jul-2005  yamt defflag VMSWAP.
 1.98  23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.97  22-Jul-2005  yamt uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.
 1.96  17-Jul-2005  yamt ensure that vnodes with dirty pages are always on syncer's queue.

- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).

- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.

fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)

- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).

- add some assertions.
 1.95  27-Jun-2005  thorpej branches: 1.95.2;
Use ANSI function decls.
 1.94  11-May-2005  yamt allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.93  27-Apr-2005  yamt uvmfault_anonget: check uvm_reclaimable() where appropriate.
 1.92  12-Apr-2005  yamt fix unreasonably frequent "killed: out of swap" on systems which have
little or no swap.
- even on a severe swap shortage, if we have some amount of file-backed pages,
don't bother to kill processes.
- if all pages in queue will be likely reactivated, just give up
page type balancing rather than spinning unnecessarily.
 1.91  28-Feb-2005  chs branches: 1.91.2;
use TRUE and FALSE instead of 1 and 0 for boolean_t.
 1.90  07-Feb-2005  yamt uvm_fault: fix integer overflow so that MADV_SEQUENTIAL
can work on large files.
 1.89  01-Jan-2005  yamt branches: 1.89.2; 1.89.4;
uvm_fault: pass NULL pap to pmap_extract where we don't need paddr.
 1.88  05-May-2004  yamt fix a amap_wirerange deadlock problem by re-introducing
PG_RELEASED for anon pages. PR/23171 from Christian Limpach.
for details, see discussion filed in the PR database.

uvm_anon_release: a new function to free anon-owned PG_RELEASED page.
uvm_anfree: we can't wait for the page here because the caller might hold
amap lock. instead, just mark the page as PG_RELEASED.
who unbusy the page should check the PG_RELEASED.
uvm_aio_aiodone: uvm_anon_release() instead of uvm_page_unbusy()
if appropriate.
uvmfault_anonget: check PG_RELEASED.
 1.87  24-Mar-2004  junyoung branches: 1.87.2;
Nuke __P().
 1.86  02-Mar-2004  yamt uvm_fault: check loan_count of neighborhood object page properly.

PR/24595 from Stephan Uphoff.
 1.85  10-Feb-2004  dbj s/fauling/faulting/
 1.84  11-Aug-2003  pk Make sure to call uvm_swap_free() and uvm_swap_markbad() with valid (i.e.
positive) slot numbers.
 1.83  11-Aug-2003  pk Introduce uvm_swapisfull(), which computes the available swap space by
taking into account swap devices that are in the process of being removed.
 1.82  03-May-2003  yamt branches: 1.82.2;
use uvm_loanbreak in uvm_fault.
 1.81  09-Feb-2003  pk uvm_fault: case 1B: lock page queue before calling uvm_pageactivate().
 1.80  18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.79  30-Oct-2002  yamt change "uoff" to voff_t from vaddr_t as it's offset within uvm object.

fix PR/18855.
 1.78  02-Sep-2002  thorpej When breaking an loan due to a page fault, check to see if the other
kind of reference-holder (anon or object) is referencing the page. If
not, then the page must be removed from the pageq's.

Reviewed by Chuck Silvers.
 1.77  29-Aug-2002  chs be sure that the page we allocate to break a loan is put on a paging queue.
fixes PR 18037.
 1.76  25-Mar-2002  chs branches: 1.76.2; 1.76.4;
when processing PG_RDONLY, mask off VM_PROT_WRITE instead of hard-wiring
VM_PROT_READ (since we might have VM_PROT_EXEC too). this fixes problems
running binaries out of NFS on macppc. yet another fix courtesy of enami.
 1.75  09-Mar-2002  chs a vm_prot_t is a bit-mask, fix an assertion which was treating one
more like an enumerated type.
 1.74  02-Jan-2002  chs in uvm_fault_unwire_locked(), if we find that a pmap entry is missing,
just skip that page. this situation can arise legitimately when a file
with a wired mapping is truncated so that a wired page is no longer
part of the file.
 1.73  01-Jan-2002  chs redo part of the last commit.
 1.72  31-Dec-2001  chs introduce a new UVM fault type, VM_FAULT_WIREMAX. this is different
from VM_FAULT_WIRE in that when the pages being wired are faulted in,
the simulated fault is at the maximum protection allowed for the mapping
instead of the current protection. use this in uvm_map_pageable{,_all}()
to fix the problem where writing via ptrace() to shared libraries that
are also mapped with wired mappings in another process causes a
diagnostic panic when the wired mapping is removed.

this is a really obscure problem so it deserves some more explanation.
ptrace() writing to another process ends up down in uvm_map_extract(),
which for MAP_PRIVATE mappings (such as shared libraries) will cause
the amap to be copied or created. then the amap is made shared
(ie. the AMAP_SHARED flag is set) between the kernel and the ptrace()d
process so that the kernel can modify pages in the amap and have the
ptrace()d process see the changes. then when the page being modified
is actually faulted on, the object pages (from the shared library vnode)
is copied to a new anon page and inserted into the shared amap.
to make all the processes sharing the amap actually see the new anon
page instead of the vnode page that was there before, we need to
invalidate all the pmap-level mappings of the vnode page in the pmaps
of the processes sharing the amap, but we don't have a good way of
doing this. the amap doesn't keep track of the vm_maps which map it.
so all we can do at this point is to remove all the mappings of the
page with pmap_page_protect(), but this has the unfortunate side-effect
of removing wired mappings as well. removing wired mappings with
pmap_page_protect() is a legitimate operation, it can happen when a file
with a wired mapping is truncated. so the pmap has no way of knowing
whether a request to remove a wired mapping is normal or when it's due to
this weird situation. so the pmap has to remove the weird mapping.
the process being ptrace()d goes away and life continues. then,
much later when we go to unwire or remove the wired vm_map mapping,
we discover that the pmap mapping has been removed when it should
still be there, and we panic.

so where did we go wrong? the problem is that we don't have any way
to update just the pmap mappings that need to be updated in this
scenario. we could invent a mechanism to do this, but that is much
more complicated than this change and it doesn't seem like the right
way to go in the long run either.

the real underlying problem here is that wired pmap mappings just
aren't a good concept. one of the original properties of the pmap
design was supposed to be that all the information in the pmap could
be thrown away at any time and the VM system could regenerate it all
through fault processing, but wired pmap mappings don't allow that.
a better design for UVM would not require wired pmap mappings,
and Chuck C. and I are talking about this, but it won't be done
anytime soon, so this change will do for now.

this change has the effect of causing MAP_PRIVATE mappings to be
copied to anonymous memory when they are mlock()d, so that uvm_fault()
doesn't need to copy these pages later when called from ptrace(), thus
avoiding the call to pmap_page_protect() and the panic that results
from this when the mlock()d region is unlocked or freed. note that
this change doesn't help the case where the wired mapping is MAP_SHARED.

discussed at great length with Chuck Cranor.
fixes PRs 10363, 12554, 12604, 13041, 13487, 14580 and 14853.
 1.71  10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.70  03-Oct-2001  chs branches: 1.70.2;
skip the MADV_SEQUENTIAL processing if we refault. fixes PR 14060.
 1.69  15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.68  10-Sep-2001  chris Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.
 1.67  26-Jun-2001  thorpej branches: 1.67.2; 1.67.4;
Reduce some complexity in the fault path -- Rather than maintaining
an spl-protected "interrupt safe map" list, simply require that callers
of uvm_fault() never call us in interrupt context (MD code must make
the assertion), and check for interrupt-safe maps in uvmfault_lookup()
before we lock the map.
 1.66  26-Jun-2001  thorpej Note that uvm_fault() must NEVER EVER EVER be called in interrupt
context.
 1.65  14-Jun-2001  chs work around an overflow problem in uvm_fault_wire().
from Eduardo Horvath and Simon Burge.
 1.64  02-Jun-2001  chs replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.63  25-May-2001  chs remove trailing whitespace.
 1.62  25-Apr-2001  thorpej Add a comment describing a problem.
 1.61  24-Apr-2001  thorpej Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.
 1.60  01-Apr-2001  chs undo the part of a previous commit which turned a check for faulting
on an "intrsafe" map into a KASSERT. this situation can be caused by
an application accessing /dev/kmem.
 1.59  17-Mar-2001  chs return the real error from pgo_fault().
 1.58  15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.57  10-Mar-2001  chs eliminate the VM_PAGER_* error codes in favor of the traditional E* codes.
the mapping is:

VM_PAGER_OK 0
VM_PAGER_BAD <unused>
VM_PAGER_FAIL <unused>
VM_PAGER_PEND 0 (see below)
VM_PAGER_ERROR EIO
VM_PAGER_AGAIN EAGAIN
VM_PAGER_UNLOCK EBUSY
VM_PAGER_REFAULT ERESTART

for async i/o requests, it used to be possible for the request to
be convert to sync, and the pager would return VM_PAGER_OK or VM_PAGER_PEND
to indicate whether the caller should perform post-i/o cleanup.
this is no longer allowed; pagers must now return 0 to indicate that
the async i/o was successfully started, and the caller never needs to
worry about doing the post-i/o cleanup.
 1.56  18-Feb-2001  chs branches: 1.56.2;
clean up DIAGNOSTIC checks, use KASSERT().
 1.55  28-Jan-2001  thorpej Page scanner improvements, behavior is actually a bit more like
Mach VM's now. Specific changes:
- Pages now need not have all of their mappings removed before being
put on the inactive list. They only need to have the "referenced"
attribute cleared. This makes putting pages onto the inactive list
much more efficient. In order to eliminate redundant clearings of
"refrenced", callers of uvm_pagedeactivate() must now do this
themselves.
- When checking the "modified" attribute for a page (for clearing
PG_CLEAN), make sure to only do it if PG_CLEAN is currently set on
the page (saves a potentially expensive pmap operation).
- When scanning the inactive list, if a page is referenced, reactivate
it (this part was actually added in uvm_pdaemon.c,v 1.27). This
now works properly now that pages on the inactive list are allowed to
have mappings.
- When scanning the inactive list and considering a page for freeing,
remove all mappings, and then check the "modified" attribute if the
page is marked PG_CLEAN.
- When scanning the active list, if the page was referenced since its
last sweep by the scanner, don't deactivate it. (This part was
actually added in uvm_pdaemon.c,v 1.28.)

These changes greatly improve interactive performance during
moderate to high memory and I/O load.
 1.54  23-Jan-2001  thorpej Change uvm_analloc() to return a locked anon, update all callers,
and fix an anon locking protocol error in uvm_loanzero().
 1.53  23-Jan-2001  thorpej Sprinkle some assertions:
amap_free(): Assert that the amap is locked.
amap_share_protect(): Assert that the amap is locked.
amap_wipeout(): Assert that the amap is locked.
uvm_anfree(): Assert that the anon has a reference count of 0 and is
not locked.
uvm_anon_lockloanpg(): Assert that the anon is locked.
anon_pagein(): Assert that the anon is locked.
uvmfault_anonget(): Assert that the anon is locked.
uvm_pagealloc_strat(): Assert that the uobj or the anon is locked

And fix the problems these have uncovered:
amap_cow_now(): Lock the new anon after allocating it, and unref and
unlock it (rather than lock!) before freeing it in case
of an error condition. This should fix a problem reported
by Dan Carosone using cdrecord on an i386 MP kernel.
uvm_fault(): Case1B -- Lock the new anon afer allocating it, and unlock
it later when we unlock the old anon.
Case2 -- Lock the new anon after allocating it, and unlock
it later by passing it to uvmfault_unlockall() (we set anon
to NULL if we're not doing a promote fault).
 1.52  27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.51  06-Aug-2000  thorpej Update a comment in uvmfault_anonget() to reflect reality, and
make uvm_fault() handle uvmfault_anonget() failure properly (i.e.
don't unlock a lock that's already unlocked).
 1.50  27-Jun-2000  mrg remove include of <vm/vm.h>
 1.49  26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.48  10-Apr-2000  thorpej branches: 1.48.4;
Use UVM_PGA_ZERO in the promote-zero-fault case of uvm_fault().
 1.47  11-Jan-2000  chs add support for ``swapctl -d'' (removing swap space).
improve handling of i/o errors in swap space.

reviewed by: Chuck Cranor
 1.46  13-Nov-1999  thorpej Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.
 1.45  12-Sep-1999  chs branches: 1.45.2; 1.45.4; 1.45.8;
eliminate the PMAP_NEW option by making it required for all ports.
ports which previously had no support for PMAP_NEW now implement
the pmap_k* interfaces as wrappers around the non-k versions.
 1.44  22-Jul-1999  thorpej Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.43  19-Jul-1999  cgd make sure 'wide' fault handling is actually done only once per fault.
('narrow' was mistakenly set to FALSE instead of TRUE.) Committed after
discussion with chuq.
 1.42  11-Jul-1999  thorpej Back out the change I made yesterday. It seems to cause some trouble
for some folks.
 1.41  10-Jul-1999  thorpej Simplify uvm_fault_unwire_locked() a little.
 1.40  08-Jul-1999  thorpej Change the pmap_extract() interface to:
boolean_t pmap_extract(pmap_t, vaddr_t, paddr_t *);
This makes it possible for the pmap to map physical address 0.
 1.39  17-Jun-1999  thorpej pmap_change_wiring() -> pmap_unwire().
 1.38  17-Jun-1999  thorpej Remove pmap_pageable(); no pmap implements it, and it is not really useful,
because pmap_enter()/pmap_change_wiring() (soon to be pmap_unwire())
communicate the information in greater detail.
 1.37  16-Jun-1999  thorpej When unwiring a range in uvm_fault_unwire_locked(), don't call
pmap_change_wiring(...,FALSE) unless the map entry claims the address
is unwired. This fixes the following scenario, as described on
tech-kern@netbsd.org on Wed 6/16/1999 12:25:23:

- User mlock(2)'s a buffer, to guarantee it will never become
non-resident while he is using it.

- User then does physio to that buffer. Physio calls uvm_vslock()
to lock down the pages and ensure that page faults do not happen
while the I/O is in progress (possibly in interrupt context).

- Physio does the I/O.

- Physio calls uvm_vsunlock(). This calls uvm_fault_unwire().

>>> HERE IS WHERE THE PROBLEM OCCURS <<<

uvm_fault_unwire() calls pmap_change_wiring(..., FALSE),
which now gives the pmap free reign to recycle the mapping
information for that page, which is illegal; the mapping is
still wired (due to the mlock(2)), but now access of the
page could cause a non-protection page fault (disallowed).

NOTE: This could eventually lead to a panic when the user
subsequently munlock(2)'s the buffer and the mapping info
has been recycled for use by another mapping!
 1.36  16-Jun-1999  thorpej * Rename uvm_fault_unwire() to uvm_fault_unwire_locked(), and require that
the map be at least read-locked to call this function. This requirement
will be taken advantage of in a future commit.
* Write a uvm_fault_unwire() wrapper which read-locks the map and calls
uvm_fault_unwire_locked().
* Update the comments describing the locking contraints of uvm_fault_wire()
and uvm_fault_unwire().
 1.35  16-Jun-1999  thorpej Remove a incorrect-and-no-longer-relevant comment.
 1.34  16-Jun-1999  thorpej Add a macro to test if a map entry is wired.
 1.33  04-Jun-1999  thorpej Keep interrupt-safe maps on an additional queue. In uvm_fault(), if we're
looking up a kernel address, check to see if the address is on this
"interrupt-safe" list. If so, return failure immediately. This prevents
a locking screw if a page fault is taken on an interrupt-safe map in or
out of interrupt context.
 1.32  02-Jun-1999  thorpej A page fault on a non-pageable map is always fatal.
 1.31  28-May-1999  thorpej Make uvm_fault_unwire() take a vm_map_t, rather than a pmap_t, for
consistency. Use this opportunity for checking for intrsafe map use
in this routine (which is illegal).
 1.30  26-May-1999  thorpej Pass an access_type to uvm_fault_wire(), which it forwards on to
uvm_fault().
 1.29  19-May-1999  chs when wiring swap-backed pages, clear the PG_CLEAN flag before
releasing any swap resources. if we don't do this, we can
end up with a clean, swap-backed page, which is illegal.
tracked down by Bill Sommerfeld, fixes PR 7578.
 1.28  11-Apr-1999  chs add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.27  29-Mar-1999  mycroft branches: 1.27.2;
Duuuh. Back and front pages should have an access_type of 0, since we don't
know they're going to be used. What was I thinking??
 1.26  28-Mar-1999  mycroft Reduce the access_type for copy-on-write pages in the front and back regions.
 1.25  28-Mar-1999  mycroft Fix a case I missed in the previous.
 1.24  28-Mar-1999  mycroft Only turn off VM_PROT_WRITE for COW pages; not VM_PROT_EXECUTE.
 1.23  26-Mar-1999  mycroft Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().
 1.22  26-Mar-1999  chs add uvmexp.swpgonly and use it to detect out-of-swap conditions.
 1.21  25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.20  31-Jan-1999  mrg 80 cols.
 1.19  24-Jan-1999  chuck cleanup/reorg:
- break anon related functions out of uvm_amap.c and put them in their own
file (uvm_anon.c). includes break up uvm_anon_init into an amap and an
an anon init function
- ensure that only functions within the amap module access amap structure
fields (add macros to amap api as needed)
 1.18  20-Nov-1998  chuck update outdated an_swslot comments
 1.17  07-Nov-1998  mrg branches: 1.17.2;
minor KNF nits
 1.16  04-Nov-1998  chs be consistent with locking of amaps and anons when freeing them.
 1.15  18-Oct-1998  chs shift by PAGE_SHIFT instead of multiplying or dividing by PAGE_SIZE.
 1.14  16-Oct-1998  tv Check for gcc the Right way when quashing -Wuninitialized goop.
 1.13  11-Oct-1998  chuck remove unused share map code from UVM:
- simplify uvm_faultinfo in uvm_fault.h (parent map tracking no longer needed)
- adjust locking and lookup functions in uvm_fault_i.h to reflect the above
- replace ufi.rvaddr with ufi.orig_rvaddr in uvm_fault.c since rvaddr is
no longer needed.
- no need to worry about share map translations in uvm_fault(). simplify.
 1.12  13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.11  02-Jun-1998  mark branches: 1.11.2;
Use the sparc's GCC lossage fix for the arm32 port as well. Problem appears
to be a compiler bug resulting in an 'variable possibly used uninitialised'
warning when optimisation is used.
 1.10  05-May-1998  kleink Remove inclusions of syscall (and syscall argument) related header files;
we don't need them here.
 1.9  26-Mar-1998  chuck update per-process rusage fault counters (ru_majflt/ru_minflt) under UVM
 1.8  22-Mar-1998  chuck remove tmpwire arg from uvm_pagewire() -- it isn't needed anymore.
noted by chuck s.
 1.7  09-Mar-1998  mrg KNF.
 1.6  10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.5  07-Feb-1998  mrg implement counters for pages paged in/out
 1.4  07-Feb-1998  mrg restore rcsids
 1.3  07-Feb-1998  chs don't try to relock amap if there isn't one.
 1.2  06-Feb-1998  thorpej RCS ID police.
 1.1  05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1  05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.11.2.1  30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.17.2.3  02-Jun-1999  chs honor the new PG_RDONLY flag.
 1.17.2.2  25-Feb-1999  chs remove the hacky splhigh() around the pgo_fault() call.
thread_wakeup() -> wakeup().
use SLOCK_{,UN}LOCKED.
 1.17.2.1  09-Nov-1998  chs initial snapshot. lots left to do.
 1.27.2.2  18-Jun-1999  perry pullup 1.28->1.29 (chuq): fixes loss of process data under heavy paging bug
 1.27.2.1  16-Apr-1999  chs branches: 1.27.2.1.2; 1.27.2.1.4;
pull up 1.27 -> 1.28:
add a `flags' argument to uvm_pagealloc_strat().
define a flag UVM_PGA_USERESERVE to allow non-kernel object
allocations to use pages from the reserve.
use the new flag for allocations in pmap modules.
 1.27.2.1.4.1  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.27.2.1.2.5  02-Aug-1999  thorpej Update from trunk.
 1.27.2.1.2.4  02-Aug-1999  thorpej Update from trunk.
 1.27.2.1.2.3  04-Jul-1999  chs add PGO_SYNCIO to the flags to pgo_fault() and pgo_get() (unlocked).
this just makes things work out better in the handlers.
 1.27.2.1.2.2  21-Jun-1999  thorpej Sync w/ -current.
 1.27.2.1.2.1  07-Jun-1999  chs merge everything from chs-ubc branch.
 1.45.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.45.4.1  15-Nov-1999  fvdl Sync with -current
 1.45.2.6  21-Apr-2001  bouyer Sync with HEAD
 1.45.2.5  27-Mar-2001  bouyer Sync with HEAD.
 1.45.2.4  12-Mar-2001  bouyer Sync with HEAD.
 1.45.2.3  11-Feb-2001  bouyer Sync with HEAD.
 1.45.2.2  08-Dec-2000  bouyer Sync with HEAD.
 1.45.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.48.4.2  16-Jun-2001  he Pull up revision 1.65 (via patch, requested by chuck):
Work around overflow problem in uvm_fault_wire().
 1.48.4.1  06-Aug-2000  thorpej Pull up rev. 1.51:
Update a comment in uvmfault_anonget() to reflect reality, and
make uvm_fault() handle uvmfault_anonget() failure properly (i.e.
don't unlock a lock that's already unlocked).
 1.56.2.13  30-Oct-2002  thorpej Sync with HEAD.
 1.56.2.12  17-Sep-2002  nathanw Catch up to -current.
 1.56.2.11  12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.56.2.10  24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.56.2.9  01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.56.2.8  08-Jan-2002  nathanw Catch up to -current.
 1.56.2.7  14-Nov-2001  nathanw Catch up to -current.
 1.56.2.6  08-Oct-2001  nathanw Catch up to -current.
 1.56.2.5  21-Sep-2001  nathanw Catch up to -current.
 1.56.2.4  24-Aug-2001  nathanw Catch up with -current.
 1.56.2.3  21-Jun-2001  nathanw Catch up to -current.
 1.56.2.2  09-Apr-2001  nathanw Catch up with -current.
 1.56.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.67.4.2  11-Oct-2001  fvdl Catch up with -current. Fix some bogons in the sparc64 kbd/ms
attach code. cd18xx conversion provided by mrg.
 1.67.4.1  01-Oct-2001  fvdl Catch up with -current.
 1.67.2.5  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.67.2.4  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.67.2.3  16-Mar-2002  jdolecek Catch up with -current.
 1.67.2.2  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.67.2.1  13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.70.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.76.4.3  10-Dec-2002  jmc Pull up revisions 1.78-1.79 (requested by thorpej in ticket #952)
change uoff to voff_t from vaddr_t as it's offset within uvm object.
fix PR/18855.
 1.76.4.2  30-Nov-2002  he Pull up revision 1.78 (requested by thorpej in ticket #759):
When breaking a loan due to a page fault, check to see if
the other kind of reference-holder (anon or object) is
referencing the page. If not, the page must be removed
from the paging queue.
 1.76.4.1  30-Nov-2002  he Pull up revision 1.77 (requested by chs in ticket #770):
Be sure that the page we allocate to break a loan is put
on a paging queue. Fixes PR#18037.
 1.76.2.1  31-Aug-2002  gehenna catch up with -current.
 1.82.2.7  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.82.2.6  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.82.2.5  09-Feb-2005  skrll Sync with HEAD.
 1.82.2.4  17-Jan-2005  skrll Sync with HEAD.
 1.82.2.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.82.2.2  18-Sep-2004  skrll Sync with HEAD.
 1.82.2.1  03-Aug-2004  skrll Sync with HEAD
 1.87.2.1  10-May-2004  tron branches: 1.87.2.1.2;
Pull up revision 1.88 (requested by yamt in ticket #271):
fix a amap_wirerange deadlock problem by re-introducing
PG_RELEASED for anon pages. PR/23171 from Christian Limpach.
for details, see discussion filed in the PR database.
uvm_anon_release: a new function to free anon-owned PG_RELEASED page.
uvm_anfree: we can't wait for the page here because the caller might hold
amap lock. instead, just mark the page as PG_RELEASED.
who unbusy the page should check the PG_RELEASED.
uvm_aio_aiodone: uvm_anon_release() instead of uvm_page_unbusy()
if appropriate.
uvmfault_anonget: check PG_RELEASED.
 1.87.2.1.2.1  11-May-2005  riz Pull up revision 1.90 (requested by dbj in ticket #1409):
uvm_fault: fix integer overflow so that MADV_SEQUENTIAL
can work on large files.
 1.89.4.2  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.89.4.1  12-Feb-2005  yamt sync with head.
 1.89.2.1  29-Apr-2005  kent sync with -current
 1.91.2.1  24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.95.2.6  21-Jan-2008  yamt sync with head
 1.95.2.5  27-Oct-2007  yamt sync with head.
 1.95.2.4  03-Sep-2007  yamt sync with head.
 1.95.2.3  26-Feb-2007  yamt sync with head.
 1.95.2.2  30-Dec-2006  yamt sync with head.
 1.95.2.1  21-Jun-2006  yamt sync with head.
 1.103.2.3  01-Mar-2006  yamt sync with head.
 1.103.2.2  18-Feb-2006  yamt sync with head.
 1.103.2.1  01-Feb-2006  yamt sync with head.
 1.107.4.1  22-Apr-2006  simonb Sync with head.
 1.107.2.1  09-Sep-2006  rpaulo sync with head
 1.109.4.1  19-Apr-2006  elad oops - *really* sync to head this time.
 1.109.2.3  11-Apr-2006  yamt sync with head
 1.109.2.2  01-Apr-2006  yamt sync with head.
 1.109.2.1  05-Mar-2006  yamt separate page replacement policy from the rest of kernel.
 1.110.2.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.111.8.2  12-Jan-2007  ad Sync with head.
 1.111.8.1  18-Nov-2006  ad Sync with head.
 1.112.2.3  18-Dec-2006  yamt sync with head.
 1.112.2.2  10-Dec-2006  yamt sync with head.
 1.112.2.1  22-Oct-2006  yamt sync with head
 1.117.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.119.12.1  15-Aug-2007  skrll Sync with HEAD.
 1.119.4.1  13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.120.10.2  21-Jul-2007  ad Merge unobtrusive locking changes from the vmlocking branch.
 1.120.10.1  21-Jul-2007  ad file uvm_fault.c was added on branch matt-mips64 on 2007-07-21 19:21:55 +0000
 1.120.8.1  14-Oct-2007  yamt sync with head.
 1.120.6.3  23-Mar-2008  matt sync with HEAD
 1.120.6.2  09-Jan-2008  matt sync with HEAD
 1.120.6.1  06-Nov-2007  matt sync with HEAD
 1.120.4.1  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.121.10.2  19-Jan-2008  bouyer Sync with HEAD
 1.121.10.1  02-Jan-2008  bouyer Sync with HEAD
 1.121.6.1  04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.121.4.1  18-Feb-2008  mjf Sync with HEAD.
 1.123.6.3  17-Jan-2009  mjf Sync with HEAD.
 1.123.6.2  28-Sep-2008  mjf Sync with HEAD.
 1.123.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.124.8.1  18-Jul-2008  simonb Sync with head.
 1.124.6.1  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.124.4.3  11-Aug-2010  yamt sync with head.
 1.124.4.2  11-Mar-2010  yamt sync with head
 1.124.4.1  04-May-2009  yamt sync with head.
 1.125.6.2  21-Nov-2010  riz Pull up following revision(s) (requested by rmind in ticket #1421):
sys/uvm/uvm_bio.c: revision 1.70
sys/uvm/uvm_map.c: revision 1.292
sys/uvm/uvm_pager.c: revision 1.98
sys/uvm/uvm_fault.c: revision 1.175
sys/uvm/uvm_bio.c: revision 1.69
ubc_fault: split-off code part handling a single page into ubc_fault_page().
Keep the lock around pmap_update() where required. While fixing this
in ubc_fault(), rework logic to &quot;remember&quot; the last object of page and
reduce locking overhead, since in common case pages belong to one and
the same UVM object (but not always, therefore add a comment).
Unlocks before pmap_update(), on removal of mappings, might cause TLB
coherency issues, since on architectures like x86 and mips64 invalidation
IPIs are deferred to pmap_update(). Hence, VA space might be globally
visible before IPIs are sent or while they are still in-flight.
OK ad@.
 1.125.6.1  02-Feb-2009  snj branches: 1.125.6.1.4;
Pull up following revision(s) (requested by ad in ticket #354):
sys/uvm/uvm_fault.c: revision 1.126
sys/uvm/uvm_map.c: revision 1.268
Move a couple of calls to pmap_update().
 1.125.6.1.4.6  12-Apr-2012  matt Apply colormask to get a valid color.
 1.125.6.1.4.5  12-Apr-2012  matt Separate object-less anon pages out of the active list if there is no swap
device. Make uvm_reclaimable and uvm.*estimatable understand colors and
kmem allocations.
 1.125.6.1.4.4  29-Feb-2012  matt Improve UVM_PAGE_TRKOWN.
Add more asserts to uvm_page.
 1.125.6.1.4.3  03-Jun-2011  matt Restore $NetBSD$
 1.125.6.1.4.2  25-May-2011  matt Make uvm_map recognize UVM_FLAG_COLORMATCH which tells uvm_map that the
'align' argument specifies the starting color of the KVA range to be returned.

When calling uvm_km_alloc with UVM_KMF_VAONLY, also specify the starting
color of the kva range returned (UMV_KMF_COLORMATCH) and pass those to
uvm_map.

In uvm_pglistalloc, make sure the pages being returned have sequentially
advancing colors (so they can be mapped in a contiguous address range).
Add a few missing UVM_FLAG_COLORMATCH flags to uvm_pagealloc calls.

Make the socket and pipe loan color-safe.

Make the mips pmap enforce strict page color (color(VA) == color(PA)).
 1.125.6.1.4.1  26-Jan-2010  matt Pass hints to uvm_pagealloc* to get it to use the right page color rather
than guess the right page color.
 1.125.4.1  19-Jan-2009  skrll Sync with HEAD.
 1.166.2.27  21-Nov-2010  uebayasi Rename PGO_ZERO as PGO_HOLE, and s/uvm_page_zeropage/uvm_page_holepage/.
 1.166.2.26  21-Nov-2010  uebayasi UVMHIST log for XIP hole COW.
 1.166.2.25  21-Nov-2010  uebayasi Resurrect PGO_ZERO support.

When vnode pager encounters hole pages in XIP'ed vnodes, it fills
page slots with PGO_ZERO and returns them back to the caller (fault
handler). Fault handlers are responsible to check page slots and
redirect PGO_ZERO to the single "zero page" allocated by calling
uvm_page_zeropage_alloc(9).

The zero page is wired, read-only (PG_RDONLY) page. It's shared
by multiple vnodes, it has no single owner.

XIP'ed vnodes are supposed to be "stable" during I/O (unlocked).
Because XIP'ed mounts are always read-only. There's no chance to
change mappings of XIP'ed vnodes and their XIP'ed pages. Thus the
cached uobj is reused after pgo_get() for PGO_ZERO.

(Do we need a new concept of "read-only UVM object"?)
 1.166.2.24  19-Nov-2010  uebayasi Make XIP genfs_getpages_xip() return pages in I/O path, preparing
merge into the generic genfs_getpages().
 1.166.2.23  04-Nov-2010  uebayasi Split physical device segment pages from "managed" to "managed
device". Cache that information as a flag PG_DEVICE so that callers
don't need to walk physsegs everytime.

Remove PQ_FIXED, which means that page daemon doesn't need to know
device segment pages at all. But still fault handlers need to know
them.

I think this is what I can do best now.
 1.166.2.22  17-Aug-2010  uebayasi Sync with HEAD.
 1.166.2.21  12-Aug-2010  uebayasi Fix a #if/#ifdef miuse.
 1.166.2.20  22-Jul-2010  uebayasi s/PG_XIP/PQ_FIXED/, meaning that the fault handler sees XIP pages as
"fixed", and doesn't pass them to paging activity.

("XIP" is a vnode specific knowledge. It was wrong that the fault
handler had to know such a special thing.)
 1.166.2.19  15-Jul-2010  uebayasi Rename PG_DIRECT to PG_XIP. PG_XIP is marked to XIP vnode pages.
 1.166.2.18  14-Jul-2010  uebayasi One more XIP code reduction.
 1.166.2.17  13-Jul-2010  uebayasi Reduce more diffs from the original.
 1.166.2.16  12-Jul-2010  uebayasi Reduce more diff by backing out XIP page specific code. Allow XIP pages
to be loaned.
 1.166.2.15  12-Jul-2010  uebayasi Now XIP pages have vm_page, adjust some code and reduce diff to the
original code.
 1.166.2.14  09-Jul-2010  uebayasi Whitespace.
 1.166.2.13  09-Jul-2010  uebayasi Mark XIP pages as PG_CLEAN and/or PG_BUSY when appropriate. Protect
vnode lock when vm_page::flags is manipulated.
 1.166.2.12  08-Jul-2010  uebayasi Mark XIP pages as PG_RDONLY.
 1.166.2.11  08-Jul-2010  uebayasi Whitespace.
 1.166.2.10  07-Jul-2010  uebayasi Clean up; merge options DIRECT_PAGE into options XIP.
 1.166.2.9  07-Jul-2010  uebayasi To simplify things, revert global vm_page_md hash and allocate struct
vm_page [] for XIP physical segments.
 1.166.2.8  09-Jun-2010  uebayasi Fix build with DIAGNOSTIC.
 1.166.2.7  31-May-2010  uebayasi Re-define the definition of "device page"; device pages are pages of
device memory. Pages which don't have vm_page (== can't be used for
generic use), but whose PV are tracked, are called "direct pages" from
now.
 1.166.2.6  28-Feb-2010  uebayasi Put comments why device pages skip some code paths. Don't skip accounting
for "neighbor" device pages.
 1.166.2.5  24-Feb-2010  uebayasi Sync with HEAD.
 1.166.2.4  23-Feb-2010  uebayasi uvm_fault_lower_promote: One more missing part for device pages to by-pass
page cache handling. When a page in a uobj is promoted, its content is copied
to another owned by the newly allocated anon. The old page cache is then
disposed. Of course we don't need to dispose device pages in such a case,
so skip it.

Don't forget opt_device_page.h.

Count lower fault correctly.
 1.166.2.3  12-Feb-2010  uebayasi Teach device page handling to the "lower" fault handler. Skip all the paging
activities, no loaning, no wired count. Only compile tested so far.
 1.166.2.2  12-Feb-2010  uebayasi uvmfault_promote: For promotion from a "lower" page, pass the belonging struct
uvm_object * from callers, because device page struct vm_page * doesn't have
a back-pointer to the uvm_object.
 1.166.2.1  08-Feb-2010  uebayasi file uvm_fault.c was added on branch uebayasi-xip on 2010-02-12 16:06:50 +0000
 1.173.2.9  31-May-2011  rmind sync with head
 1.173.2.8  21-May-2011  rmind uvm_fault_lower_promote: fix assert (move a bit up, where logic applies).
 1.173.2.7  19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.173.2.6  21-Apr-2011  rmind sync with head
 1.173.2.5  05-Mar-2011  rmind sync with head
 1.173.2.4  03-Jul-2010  rmind sync with head
 1.173.2.3  30-May-2010  rmind sync with head
 1.173.2.2  17-Mar-2010  rmind Reorganise UVM locking to protect P->V state and serialise pmap(9)
operations on the same page(s) by always locking their owner. Hence
lock order: "vmpage"-lock -> pmap-lock.

Patch, proposed on tech-kern@, from Andrew Doran.
 1.173.2.1  16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.180.4.2  17-Feb-2011  bouyer Sync with HEAD
 1.180.4.1  08-Feb-2011  bouyer Sync with HEAD
 1.180.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.185.2.1  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.190.2.6  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.190.2.5  17-Apr-2012  yamt sync with head
 1.190.2.4  28-Dec-2011  yamt - assertions
- __unused
 1.190.2.3  26-Dec-2011  yamt - use O->A loan to serve read(2). based on a patch from Chuck Silvers
- associated O->A loan fixes.
 1.190.2.2  14-Nov-2011  yamt assertions
 1.190.2.1  02-Nov-2011  yamt page cache related changes

- maintain object pages in radix tree rather than rb tree.
- reduce unnecessary page scan in putpages. esp. when an object has a ton of
pages cached but only a few of them are dirty.
- reduce the number of pmap operations by tracking page dirtiness more
precisely in uvm layer.
- fix nfs commit range tracking.
- fix nfs write clustering. XXX hack
 1.191.2.2  24-Feb-2012  mrg sync to -current.
 1.191.2.1  18-Feb-2012  mrg merge to -current.
 1.194.4.1  18-May-2014  rmind sync with head
 1.194.2.2  03-Dec-2017  jdolecek update from HEAD
 1.194.2.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.195.2.1  07-Apr-2014  tls Be a little more clear and consistent about harvesting entropy from devices:

1) deprecate RND_FLAG_NO_ESTIMATE

2) define RND_FLAG_COLLECT_TIME, RND_FLAG_COLLECT_VALUE

3) define RND_FLAG_ESTIMATE_TIME, RND_FLAG_ESTIMATE_VALUE

4) define RND_FLAG_DEFAULT: RND_FLAG_COLLECT_TIME|
RND_FLAG_COLLECT_VALUE|RND_FLAG_ESTIMATE_TIME

5) Make entropy harvesting from environmental sensors a little more generic
and remove it from individual sensor drivers.

6) Remove individual open-coded delta-estimators for values from a few
places in the tree (uvm, environmental drivers).

7) 0 -> RND_FLAG_DEFAULT, actually gather entropy from various drivers
that had stubbed out code, other minor cleanups.
 1.196.4.2  28-Aug-2017  skrll Sync with HEAD
 1.196.4.1  22-Sep-2015  skrll Sync with HEAD
 1.197.4.1  21-Apr-2017  bouyer Sync with HEAD
 1.197.2.2  26-Apr-2017  pgoyette Sync with HEAD
 1.197.2.1  20-Mar-2017  pgoyette Sync with HEAD
 1.199.6.4  22-Apr-2019  martin Pull up following revision(s) (requested by chs in ticket #1236):

sys/uvm/uvm_fault.c: revision 1.205

If a pager fault method returns ENOMEM but some memory appears to be reclaimable,
wake up the pagedaemon and retry the fault. This fixes the problems with Xorg
being killed with an "out of swap" message due to a transient memory shortage.
 1.199.6.3  27-Feb-2018  martin Pull up following revision(s) (requested by mrg in ticket #593):
sys/dev/marvell/mvxpsec.c: revision 1.2
sys/arch/m68k/m68k/pmap_motorola.c: revision 1.70
sys/opencrypto/crypto.c: revision 1.102
sys/arch/sparc64/sparc64/pmap.c: revision 1.308
sys/ufs/chfs/chfs_malloc.c: revision 1.5
sys/arch/powerpc/oea/pmap.c: revision 1.95
sys/sys/pool.h: revision 1.80,1.82
sys/kern/subr_pool.c: revision 1.209-1.216,1.219-1.220
sys/arch/alpha/alpha/pmap.c: revision 1.262
sys/kern/uipc_mbuf.c: revision 1.173
sys/uvm/uvm_fault.c: revision 1.202
sys/sys/mbuf.h: revision 1.172
sys/kern/subr_extent.c: revision 1.86
sys/arch/x86/x86/pmap.c: revision 1.266 (via patch)
sys/dev/dtv/dtv_scatter.c: revision 1.4

Allow only one pending call to a pool's backing allocator at a time.
Candidate fix for problems with hanging after kva fragmentation related
to PR kern/45718.

Proposed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html
Tested by bouyer@ on i386.

This makes one small change to the semantics of pool_prime and
pool_setlowat: they may fail with EWOULDBLOCK instead of ENOMEM, if
there is a pending call to the backing allocator in another thread but
we are not actually out of memory. That is unlikely because nearly
always these are used during initialization, when the pool is not in
use.

Define the new flag too for previous commit.

pool_grow can now fail even when sleeping is ok. Catch this case in pool_get
and retry.

Assert that pool_get failure happens only with PR_NOWAIT.
This would have caught the mistake I made last week leading to null
pointer dereferences all over the place, a mistake which I evidently
poorly scheduled alongside maxv's change to the panic message on x86
for null pointer dereferences.

Since pr_lock is now used to wait for two things now (PR_GROWING and
PR_WANTED) we need to loop for the condition we wanted.
make the KASSERTMSG/panic strings consistent as '%s: [%s], __func__, wchan'
Handle the ERESTART case from pool_grow()

don't pass 0 to the pool flags
Guess pool_cache_get(pc, 0) means PR_WAITOK here.
Earlier on in the same context we use kmem_alloc(sz, KM_SLEEP).

use PR_WAITOK everywhere.
use PR_NOWAIT.

Don't use 0 for PR_NOWAIT

use PR_NOWAIT instead of 0

panic ex nihilo -- PR_NOWAITing for zerot

Add assertions that either PR_WAITOK or PR_NOWAIT are set.
- fix an assert; we can reach there if we are nowait or limitfail.
- when priming the pool and failing with ERESTART, don't decrement the number
of pages; this avoids the issue of returning an ERESTART when we get to 0,
and is more correct.
- simplify the pool_grow code, and don't wakeup things if we ENOMEM.

In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. This implements the requirement that
pmap_enter(PMAP_CANFAIL) must not fail when replacing an existing
mapping with the first mapping of a new page, which is an unintended
consequence of the changes from the rmind-uvmplock branch in 2011.

The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.

This somewhat indirectly fixes PR 52706, as well as the failing assertion
about "uvm_page_locked_p(old_pg)". (but only on x86, various other platforms
will need their own changes to handle this issue.)
In uvm_fault_upper_enter(), if pmap_enter(PMAP_CANFAIL) fails, assert that
the pmap did not leave around a now-stale pmap mapping for an old page.
If such a pmap mapping still existed after we unlocked the vm_map,
the UVM code would not know later that it would need to lock the
lower layer object while calling the pmap to remove or replace that
stale pmap mapping. See PR 52706 for further details.
hopefully workaround the irregularly "fork fails in init" problem.
if a pool is growing, and the grower is PR_NOWAIT, mark this.
if another caller wants to grow the pool and is also PR_NOWAIT,
busy-wait for the original caller, which should either succeed
or hard-fail fairly quickly.

implement the busy-wait by unlocking and relocking this pools
mutex and returning ERESTART. other methods (such as having
the caller do this) were significantly more code and this hack
is fairly localised.
ok chs@ riastradh@

Don't release the lock in the PR_NOWAIT allocation. Move flags setting
after the acquiring the mutex. (from Tobias Nygren)
apply the change from arch/x86/x86/pmap.c rev. 1.266 commitid vZRjvmxG7YTHLOfA:

In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. If we are replacing an existing mapping,
reuse the pv structure where possible.

This implements the requirement that pmap_enter(PMAP_CANFAIL) must not fail
when replacing an existing mapping with the first mapping of a new page,
which is an unintended consequence of the changes from the rmind-uvmplock
branch in 2011.

The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.

This somewhat indirectly fixes PR 52706 on the remaining platforms where
this problem existed.
 1.199.6.2  02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.199.6.1  24-Jul-2017  snj Pull up following revision(s) (requested by kamil in ticket #120):
sys/uvm/uvm_fault.c: revision 1.200
tests/lib/libc/sys/t_write.c: revision 1.4-1.6
PR/52384: make uvm_fault_check() return EFAULT not EACCES, like our man
pages
(but not OpenGroup which does not document EFAULT for read/write, and onl=
y
documents EACCES for sockets) say for read/write.
--
check for EFAULT on reads and writes to memory with not permission.
--
add munmap
#define for const.
--
add another missing munmap (Kamil)
 1.202.2.1  21-May-2018  pgoyette Sync with HEAD
 1.204.2.4  21-Apr-2020  martin Sync with HEAD
 1.204.2.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.204.2.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.204.2.1  10-Jun-2019  christos Sync with HEAD
 1.206.2.3  15-Aug-2023  martin Pull up following revision(s) (requested by chs in ticket #1714):

sys/uvm/uvm_fault.c: revision 1.234

uvm: prevent TLB invalidation races during COW resolution

When a thread takes a page fault which results in COW resolution,
other threads in the same process can be concurrently accessing that
same mapping on other CPUs. When the faulting thread updates the pmap
entry at the end of COW processing, the resulting TLB invalidations to
other CPUs are not done atomically, so another thread can write to the
new writable page and then a third thread might still read from the
old read-only page, resulting in inconsistent views of the page by the
latter two threads. Fix this by removing the pmap entry entirely for
the original page before we install the new pmap entry for the new
page, so that the new page can only be modified after the old page is
no longer accessible.

This fixes PR 56535 as well as the netbsd versions of problems
described in various bug trackers:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225584
https://reviews.freebsd.org/D14347
https://github.com/golang/go/issues/34988
 1.206.2.2  08-Mar-2020  martin Pull up following revision(s) (requested by chs in ticket #764):

sys/uvm/uvm_fault.c: revision 1.207

fix two bugs reported in
https://syzkaller.appspot.com/bug?id=8840dce484094a926e1ec388ffb83acb2fa291c9

- in uvm_fault_check(), if the map entry is wired, handle the fault the same way
that we would handle UVM_FAULT_WIRE. faulting on wired mappings is valid
if the mapped object was truncated and then later grown again.

- in uvm_fault_unwire_locked(), we must hold the locks for the vm_map_entry
while calling pmap_extract() in order to avoid races with the mapped object
being truncated while we are unwiring it.
 1.206.2.1  11-Nov-2019  martin Pull up following revision(s) (requested by chs in ticket #414):

sys/uvm/uvm_fault.c: revision 1.208

in uvm_fault_lower_io(), fetch all the map entry values that we need
before we unlock everything.
 1.214.2.2  29-Feb-2020  ad Sync with head.
 1.214.2.1  17-Jan-2020  ad Sync with head.
 1.224.2.1  20-Apr-2020  bouyer Sync with HEAD
 1.231.2.1  15-Aug-2023  martin Pull up following revision(s) (requested by chs in ticket #327):

sys/uvm/uvm_fault.c: revision 1.234

uvm: prevent TLB invalidation races during COW resolution

When a thread takes a page fault which results in COW resolution,
other threads in the same process can be concurrently accessing that
same mapping on other CPUs. When the faulting thread updates the pmap
entry at the end of COW processing, the resulting TLB invalidations to
other CPUs are not done atomically, so another thread can write to the
new writable page and then a third thread might still read from the
old read-only page, resulting in inconsistent views of the page by the
latter two threads. Fix this by removing the pmap entry entirely for
the original page before we install the new pmap entry for the new
page, so that the new page can only be modified after the old page is
no longer accessible.

This fixes PR 56535 as well as the netbsd versions of problems
described in various bug trackers:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225584
https://reviews.freebsd.org/D14347
https://github.com/golang/go/issues/34988

RSS XML Feed