Home | History | Annotate | Download | only in uvm
History log of /src/sys/uvm/uvm_mmap.c
RevisionDateAuthorComments
 1.186  24-Feb-2025  andvar s/architecure/architecture/ and few other typos in comments.
 1.185  21-Nov-2023  riastradh pax(9): Rework header file more coherently to nix some needless #ifs.

Cleans up some of the fallout from PR kern/57711 fixes.

Could do a little more to nix PAX_SEGVGUARD conditionals but maybe
not worth it.
 1.184  07-Jul-2022  rin Convert CTASSERT(9) for PAGE_{SIZE,MASK} into KASSERT(9).

They are not compile-time constants for sparc.
 1.183  06-Jul-2022  riastradh uvm(9): fo_mmap caller guarantees positive size.

No functional change intended, just sprinkling assertions to make it
clearer.
 1.182  06-Jul-2022  riastradh mmap(2): Assert size != 0 in non-anonymous case.

This is guaranteed by a test earlier; adding the assertion just makes
it clearer that it applies to the branch where we call fo_mmap -- no
functional change intended.
 1.181  06-Jul-2022  riastradh mmap(2): Avoid overflow in rounding and checking size.
 1.180  04-Jun-2022  riastradh mmap(2): If we fail with a hint, try again without it.

`Hint' here means nonzero addr, but no MAP_FIXED or MAP_TRYFIXED.

This is suboptimal -- we could teach uvm_mmap to do a fancier search
using the address as a hint. But this should do for now.

Candidate fix for PR kern/55533.
 1.179  19-Apr-2022  riastradh Revert "mmap(2): If we fail with a hint, try again without it."

This doesn't work, because uvm_mmap releases the uobj when it fails.
Should factor this more coherently, but let's just revert for now.

Reported-by: syzbot+d347c8951821b236117a@syzkaller.appspotmail.com
Reported-by: syzbot+7643d1b769fdfa18c3b2@syzkaller.appspotmail.com
Reported-by: syzbot+44f4b39671dd580cba5c@syzkaller.appspotmail.com
Reported-by: syzbot+b5a422299ca4ffe8570c@syzkaller.appspotmail.com
Reported-by: syzbot+22681822db67b6e90cfb@syzkaller.appspotmail.com
Reported-by: syzbot+e59f493ceef72b925a17@syzkaller.appspotmail.com
Reported-by: syzbot+666f3fe8364f47e8641b@syzkaller.appspotmail.com
Reported-by: syzbot+511d4572f52f1fd9b5cc@syzkaller.appspotmail.com
 1.178  19-Apr-2022  riastradh mmap(2): If we fail with a hint, try again without it.

`Hint' here means nonzero addr, but no MAP_FIXED or MAP_TRYFIXED.

This is suboptimal -- we could teach uvm_mmap to do a fancier search
using the address as a hint. But this should do for now.

Candidate fix for PR kern/55533.

ok chs@
 1.177  27-Mar-2022  hannken Make mmap() with "len == 0" an error if not MAP_ANON. We should return
an error for MAP_ANON too but unfortunately our /libexec/ld.elf_so
sometimes creates an empty anon mapping for the bss of a shared library.

At least FreeBSD and Solaris return this error too and according to POSIX
"If len is zero, mmap() shall fail and no mapping shall be established".

Fixes PR pkg/56338 Installing qt5-qtdeclarative leaves a dangling reference

The dangling reference here originates from vn_mmap() taking a vnode
reference for this empty mapping that will never be released.
 1.176  21-Jul-2021  skrll need <sys/param.h> for COHERENCY_UNIT

Minor KNF along the way.
 1.175  23-Feb-2020  ad branches: 1.175.10;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.174  04-Oct-2019  kamil branches: 1.174.2;
Avoid left shift changing the signedness flag

Reviewed by <mrg>

Reported-by: syzbot+25ac03024cedf27f3368@syzkaller.appspotmail.com
 1.173  06-Aug-2019  maxv Change 'npgs' from int to size_t. Otherwise the 64bit->32bit conversion
could lead to npgs=0, which is not expected. It later triggers a panic
in uvm_vsunlock().

Found by TriforceAFL (Akul Pillai).
 1.172  06-Apr-2019  thorpej branches: 1.172.4;
Overhaul the API used to fetch and store individual memory cells in
userspace. The old fetch(9) and store(9) APIs (fubyte(), fuword(),
subyte(), suword(), etc.) are retired and replaced with new ufetch(9)
and ustore(9) APIs that can return proper error codes, etc. and are
implemented consistently across all platforms. The interrupt-safe
variants are no longer supported (and several of the existing attempts
at fuswintr(), etc. were buggy and not actually interrupt-safe).

Also augmement the ucas(9) API, making it consistently available on
all plaforms, supporting uniprocessor and multiprocessor systems, even
those that do not have CAS or LL/SC primitives.

Welcome to NetBSD 8.99.37.
 1.171  14-Mar-2019  christos unify rounding and range checking.
 1.170  14-Mar-2019  kre Avoid a panic from the sequence

mlock(buf, 0);
munlock(buf, 0);
mlock(buf, page);
munlock(buf, page);

where buf is page aligned, and page is actually anything > 0
(but not too big) which will get rounded up to the next multiple
of the page size.

In that sequence, it is possible that the 1st munlock() is optional.

Add a KASSERT() (or two) to detect the first effects of the problem
(without that, or in !DIAGNOSTIC kernels) the problem eventually
causes some kind of problem or other (most often still a panic.)

After this, mlock(anything, 0) (or munlock) validates "anything"
but is otherwise a no-op (regardless of the alignment of anything).

Also, don't treat mlock(buf, verybig) as equivalent to mlock(buf, 0)
which is (more or less) what we had been doing.

XXX pullup -8 (maybe -7 as well, need to check).
 1.169  19-Dec-2017  kamil branches: 1.169.4;
Drop SYS_sbrk

sbrk - change data segment size

This syscall is dummy since the inception of the project.

Sponsored by <The NetBSD Foundation>
 1.168  19-Dec-2017  kamil Drop the sstk(2) syscall stub

sstk - change stack section size

This functionality has never been implemented and is a remnant from 16-bit
UNIX. This stub appeared with the first NetBSD commit.

Sponsored by <The NetBSD Foundation>
 1.167  27-Oct-2017  utkarsh009 [syzkaller] Fix for PR #52658 as suggested by riastradh@

The bug was found by Dmitry Vyukov (dvyukov@google.com)
using syzkaller and was tested by me on a VM running
8.99.5
 1.166  20-May-2017  chs branches: 1.166.2;
MAP_FIXED means something different for mremap() than it does for mmap(),
so we cannot use UVM_FLAG_FIXED to specify both behaviors.
keep UVM_FLAG_FIXED with its earlier meaning (prior to my previous change)
of whether to use uvm_map_findspace() to locate space for the new mapping or
to use the hint address that the caller passed in, and add a new flag
UVM_FLAG_UNMAP to indicate that any existing entries in the range should be
unmapped as part of creating the new mapping. the new UVM_FLAG_UNMAP flag
may only be used if UVM_FLAG_FIXED is also specified.
 1.165  19-May-2017  chs make MAP_FIXED mapping operations atomic. fixes PR 52239.
previously, unmapping any entries being replaced was done separately
from entering the new mapping, which allowed another thread doing
a non-MAP_FIXED mapping to allocate the range out from under the
MAP_FIXED thread.
 1.164  06-May-2017  joerg Extend the mmap(2) interface to allow requesting protections for later
use with mprotect(2), but without enabling them immediately.

Extend the mremap(2) interface to allow duplicating mappings, i.e.
create a second range of virtual addresses references the same physical
pages. Duplicated mappings can have different effective protections.

Adjust PAX mprotect logic to disallow effective protections of W&X, but
allow one mapping W and another X protections. This obsoletes using
temporary files for purposes like JIT.

Adjust PAX logic for mmap(2) and mprotect(2) to fail if W&X is requested
and not silently drop the X protection.

Improve test cases to ensure correct operation of the changed
interfaces.
 1.163  29-Apr-2017  christos MAP_COPY is handled in compat
 1.162  09-Aug-2016  kre branches: 1.162.6;

The only error that can occur from munlock() on NetBSD is ENOMEM.
Make it be that way.
 1.161  07-Aug-2016  maxv KNF a little.
 1.160  07-Aug-2016  maxv Explicitly return syscall-specific error codes, instead of the ones given
by range_test. This fixes msync, mlock and munlock, which all return EINVAL
instead of ENOMEM if the address is not in the va space.

It should also fix the recent ATF failures.
 1.159  01-Jun-2016  pgoyette Variable rv is always used as a true/false boolen, so set its type
correctly.

From PR kern/46369
 1.158  24-May-2016  martin PR kern/50985: use the runtime limits of the vmspace in range_test()
instead of the compile time defaults for it.
 1.157  22-May-2016  christos reduce #ifdef mess caused by PaX
 1.156  07-Apr-2016  christos remove more ifdefs
 1.155  07-Apr-2016  christos Add PAX_MPROTECT_DEBUG
 1.154  26-Nov-2015  martin We never exec(2) with a kernel vmspace, so do not test for that, but instead
KASSERT() that we don't.
When calculating the load address for the interpreter (e.g. ld.elf_so),
we need to take into account wether the exec'd process will run with
topdown memory or bottom up. We can not use the current vmspace's flags
to test for that, as this happens too early. Luckily the execpack already
knows what the new state will be later, so instead of testing the current
vmspace, pass the info as additional argument to struct emul
e_vm_default_addr.
Fix all such functions and adopt all callers.
 1.153  04-Aug-2015  maxv Some changes, to reduce a bit my tech-kern@ patch:
- move the P_PAX_ flags out of #ifdef PAX_ASLR in pax.h
- add a generic pax_flags_active() function
- fix a comment in exec_elf.c; interp is not static
- KNF for return
- rename pax_aslr() to pax_aslr_mmap()
- rename pax_segvguard_cb() to pax_segvguard_cleanup_cb()
 1.152  01-Mar-2015  mlelstv Detect overflow when rounding length parameter and return ENOMEM.
Fixes PR kern/49692.
 1.151  10-Jan-2015  chs in uvm_mmap_dev(), use the passed-in offset instead of 0.
from Onno van der Linden in PR 49536.
 1.150  14-Dec-2014  chs add a new "fo_mmap" fileops method to allow use of arbitrary uvm_objects for
mappings of file objects. move vnode-specific details of mmap()ing a vnode
from uvm_mmap() to the new vnode-specific vn_mmap(). add new uvm_mmap_dev()
and uvm_mmap_anon() convenience functions for mapping character devices
and anonymous memory, and replace all other calls to uvm_mmap() with those.
use the new fileop in drm2 so that libdrm can use mmap() to map things
like on other platforms (instead of the ioctl that we have used so far).
 1.149  05-Sep-2014  matt branches: 1.149.2;
Use f_vnode instead of f_data
 1.148  25-Jan-2014  christos branches: 1.148.4;
make this compile.
 1.147  25-Jan-2014  christos deal with COMPAT_10 issue.
 1.146  25-Jan-2014  christos provide proper defaults for topdown and bottomup allocation.
XXX: Ports that provide their own VM_DEFAULT_ADDRESS() need to provide the
two new flavors, otherwise they get the default ones now.
 1.145  11-Sep-2013  martin Allow MD code to add aditional checks for mmap(..., MAP_FIXED) address
ranges. This can be used, for example, to avoid not implemented VA-holes,
but we probably need to check in a few more places.
 1.144  27-Jan-2012  para branches: 1.144.6; 1.144.10;
extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.143  05-Jan-2012  reinoud Revert MAP_NOSYSCALLS patch.
 1.142  22-Dec-2011  reinoud Redo uvm_map_setattr() to never fail and remove the possible panic. The
possibility of failure was a C&P error.
 1.141  20-Dec-2011  reinoud If we need to set the PK_CHKNOSYSCALL flag in struct proc be so nice to first
take the mutex. Tnx for pointing it out to me.
 1.140  20-Dec-2011  reinoud Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.

Its implemented by adding mapping-attributes to each uvm_map_entry. These can
then be queried when needed.

Currently the MAP_NOSYSCALLS is only implemented for x86 but other
architectures are easy to adapt; see the sys/arch/x86/x86/syscall.c patch.
Port maintainers are encouraged to add them for their processor ports too.
When this feature is not yet implemented for an architecture the
MAP_NOSYSCALLS is simply ignored with virtually no cpu cost..
 1.139  14-Oct-2011  hannken branches: 1.139.2; 1.139.6;
Change the vnode locking protocol of VOP_GETATTR() to request at least
a shared lock. Make all calls outside of file systems respect it.

The calls from file systems need review.

No objections from tech-kern.
 1.138  12-Oct-2011  yamt fix an integer promotion bug on 64 bit ports.
(signed + unsigned = unsigned)
 1.137  23-Jun-2011  matt Allow PAX_ASLR to be used by itself.
 1.136  12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.135  23-Apr-2011  rmind branches: 1.135.2;
Replace "malloc" in comments, remove unnecessary header inclusions.
 1.134  02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
verified with Mike Hibler it is ok to remove clause 3 on utah copyright,
as per UCB.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.133  24-Jun-2010  hannken branches: 1.133.2; 1.133.4;
Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.132  01-Nov-2009  uebayasi branches: 1.132.2; 1.132.4;
Consistently call amap / uobj layers as upper / lower, because UVM has only
those two layers by design. Approved by Chuck Cranor some time ago.
 1.131  18-Aug-2009  yamt uvm_mmap: remove a dead conditional.
 1.130  10-Jun-2009  yamt on MADV_WILLNEED, start prefetching backing object's pages.
 1.129  30-May-2009  yamt wrap long lines.
 1.128  29-Mar-2009  mrg - add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
 1.127  14-Mar-2009  dsl ANSIfy another 1261 function definitions.
The only ones left in sys are beyond by sed script!
(or in sys/dist or sys/external)
Mostly they have function pointer parameters.
 1.126  03-Jun-2008  ad branches: 1.126.6; 1.126.8; 1.126.12;
uvm_mmap: don't lock the map unless we need to.
 1.125  02-Jun-2008  ad One more.
 1.124  02-Jun-2008  ad Don't needlessly acquire v_interlock.
 1.123  02-Jun-2008  ad Don't needlessly acquire v_interlock.
 1.122  21-Mar-2008  ad branches: 1.122.2; 1.122.4; 1.122.6;
Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.121  02-Jan-2008  ad branches: 1.121.6;
Merge vmlocking2 to head.
 1.120  26-Dec-2007  christos Add PaX ASLR (Address Space Layout Randomization) [from elad and myself]

For regular (non PIE) executables randomization is enabled for:
1. The data segment
2. The stack

For PIE executables(*) randomization is enabled for:
1. The program itself
2. All shared libraries
3. The data segment
4. The stack

(*) To generate a PIE executable:
- compile everything with -fPIC
- link with -shared-libgcc -Wl,-pie

This feature is experimental, and might change. To use selectively add
options PAX_ASLR=0
in your kernel.

Currently we are using 12 bits for the stack, program, and data segment and
16 or 24 bits for mmap, depending on __LP64__.
 1.119  20-Dec-2007  dsl Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
 1.118  26-Nov-2007  pooka branches: 1.118.2; 1.118.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.117  10-Oct-2007  ad branches: 1.117.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.116  08-Oct-2007  ad Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.
 1.115  23-Sep-2007  yamt branches: 1.115.2;
make RANGE_TEST a function.
 1.114  27-Jul-2007  pooka branches: 1.114.4; 1.114.6; 1.114.8;
Change unused fflags parameter in VOP_MMAP to prot and pass in
desired vm protection.
 1.113  22-Jul-2007  pooka Retire uvn_attach() - it abuses VXLOCK and its functionality,
setting vnode sizes, is handled elsewhere: file system vnode creation
or spec_open() for regular files or block special files, respectively.

Add a call to VOP_MMAP() to the pagedvn exec path, since the vnode
is being memory mapped.

reviewed by tech-kern & wrstuden
 1.112  15-May-2007  elad branches: 1.112.2;
Some Veriexec stuff that's been rotting in my tree for months.

Bug fixes:
- Fix crash reported by Scott Ellis on current-users@.

- Fix race conditions in enforcing the Veriexec rename and remove
policies. These are NOT security issues.

- Fix memory leak in rename handling when overwriting a monitored
file.

- Fix table deletion logic.

- Don't prevent query requests if not in learning mode.


KPI updates:
- fileassoc_table_run() now takes a cookie to pass to the callback.

- veriexec_table_add() was removed, it is now done internally. As a
result, there's no longer a need for VERIEXEC_TABLESIZE.

- veriexec_report() was removed, it is now internal.

- Perform sanity checks on the entry type, and enforce default type
in veriexec_file_add() rather than in veriexecctl.

- Add veriexec_flush(), used to delete all Veriexec tables, and
veriexec_dump(), used to fill an array with all Veriexec entries.


New features:
- Add a '-k' flag to veriexecctl, to keep the filenames in the kernel
database. This allows Veriexec to produce slightly more accurate
logs under certain circumstances. In the future, this can be either
replaced by vnode->pathname translation, or combined with it.

- Add a VERIEXEC_DUMP ioctl, to dump the entire Veriexec database.
This can be used to recover a database if the file was lost.
Example usage:

# veriexecctl dump > /etc/signatures

Note that only entries with the filename kept (that is, were loaded
with the '-k' flag) will be dumped.

Idea from Brett Lymn.

- Add a VERIEXEC_FLUSH ioctl, to delete all Veriexec entries. Sample
usage:

# veriexecctl flush

- Add a 'veriexec_flags' rc(8) variable, and make its default have
the '-k' flag. On systems using the default signatures file
(generaetd from running 'veriexecgen' with no arguments), this will
use additional 32kb of kernel memory on average.

- Add a '-e' flag to veriexecctl, to evaluate the fingerprint during
load. This is done automatically for files marked as 'untrusted'.


Misc. stuff:
- The code for veriexecctl was massively simplified as a result of
eliminating the need for VERIEXEC_TABLESIZE, and now uses a single
pass of the signatures file, making the loading somewhat faster.

- Lots of minor fixes found using the (still under development)
Veriexec regression testsuite.

- Some of the messages Veriexec prints were improved.

- Various documentation fixes.


All relevant man-pages were updated to reflect the above changes.

Binary compatibility with existing veriexecctl binaries is maintained.
 1.111  11-May-2007  christos Make us standards compliant again. Return EINVAL in all cases (except for
mmap) so we cannot tell what went wrong.
 1.110  11-May-2007  christos Improve on previous and write a RANGE_TEST macro and do it on all the
system calls instead of doing a half-assed job on some of them and none
on others.
 1.109  11-May-2007  christos fix bogus wrap tests; ssize_t != int...
 1.108  04-Mar-2007  christos branches: 1.108.2; 1.108.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.107  22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.106  21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.105  09-Feb-2007  ad branches: 1.105.2;
Merge newlock2 to head.
 1.104  03-Feb-2007  elad If Veriexec prevents indirect execution of the binary, in addition to just
blocking the mmap() if exec bit is requested, also strip exec bit from
maxprot for further mprotect() calls.

Okay joerg@.
 1.103  11-Jan-2007  elad Cosmetic nit in the 'filename' passed to veriexec_verify().
 1.102  01-Nov-2006  yamt branches: 1.102.2;
remove some __unused from function parameters.
 1.101  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.100  05-Oct-2006  chs add support for O_DIRECT (I/O directly to application memory,
bypassing any kernel caching for file data).
 1.99  30-Sep-2006  elad If Veriexec enforces access type, don't allow mmap() to use PROT_EXEC on
files that don't have the "indirect" flag. Also change the "library" alias
in veriexecctl(8) to mean "file, indirect".

okay blymn@
 1.98  21-Jul-2006  ad branches: 1.98.4; 1.98.6;
- Use the LWP cached credentials where sane.
- Minor cosmetic changes.
 1.97  20-May-2006  elad Better implementation of PaX MPROTECT, after looking some more into the
code and not trying to use temporary solutions.

Lots of comments and help from YAMAMOTO Takashi, also thanks to the PaX
author for being quick to recognize that something fishy's going on. :)

Hook up in mmap/vmcmd rather than (ugh!) uvm_map_protect().

Next time I suggest to commit a temporary solution just revoke my
commit bit.
 1.96  14-May-2006  elad branches: 1.96.2;
integrate kauth.
 1.95  05-Apr-2006  christos Coverity CID 2721: Avoid bitching for impossible cases, by adding KASSERT.
 1.94  11-Dec-2005  christos branches: 1.94.4; 1.94.6; 1.94.8; 1.94.10; 1.94.12;
merge ktrace-lwp.
 1.93  10-Oct-2005  chs stop converting async msync() to sync.
this hasn't been needed for years (if it ever was).
 1.92  23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.91  11-May-2005  yamt branches: 1.91.2;
allocate anons on-demand, rather than reserving static amount of
them on boot/swapon.
 1.90  01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.89  26-Mar-2005  fvdl Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.
 1.88  11-Feb-2005  chs branches: 1.88.4;
use vm_map_{min,max}() instead of dereferencing the vm_map pointer directly.
define and use vm_map_set{min,max}() for modifying these values.
remove the {min,max}_offset aliases for these vm_map fields to be more
namespace-friendly. PR 26475.
 1.87  23-Jan-2005  chs branches: 1.87.2;
pmap_wired_count() is now available on all platforms,
remove the code for the case where it's not defined.
 1.86  01-Jan-2005  yamt branches: 1.86.2;
for in-kernel maps,
- allocate kva for vm_map_entry from the map itsself and
remove the static limit, MAX_KMAPENT.
- keep merged entries for later splitting to fix allocate-to-free problem.
PR/24039.
 1.85  02-Dec-2004  briggs mlock(2) and munlock(2) are defined by our man pages (which agree with
those on opengroup.org) to return ENOMEM if trying to lock a region that
is not accessible. So if uvm_map_pageable() returns EFAULT, make it ENOMEM.
 1.84  25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.83  19-May-2004  darrenr rather than just try to get a mapping from a device as only PROT_EXEC, work
down the list of protections until either we run out or we find one that the
device is willing to work with.
 1.82  24-Mar-2004  junyoung Drop trailing spaces.
 1.81  14-Feb-2004  dsl Fix prev. so it compiles
 1.80  14-Feb-2004  jdolecek add compat hook in check for zerodev; use this hook to recognize
the old ARM /dev/zero minor mapping #ifdef COMPAT_16
fixes second part of PR kern/23581 by Richard Earnshaw
 1.79  29-Nov-2003  yamt mincore: don't treat an aobj as a device mapping.
 1.78  07-Oct-2003  thorpej Add a MAP_WIRED flag to mmap(2), which causes the new mapping to be
wired as if by mlock(2).
 1.77  24-Aug-2003  chs fix some indentation.
 1.76  24-Aug-2003  chs mprotect()'s "len" is really a size_t, and we can't do any useful
bounds-checking on it.
 1.75  06-Jul-2003  christos PR/22062: Dheeraj S: Don't compare an integral type with NULL.
 1.74  29-Jun-2003  fvdl branches: 1.74.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.73  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.72  23-Jun-2003  christos PR/21948: Todd Vierling: Implement MAP_TRYFIXED for linux emulation.
 1.71  04-May-2003  gmcgarry Don't use overloaded term "comm". From Greg A. Woods in PR#17394.
 1.70  06-Mar-2003  matt Add support for mmap(2) to be able to return memory aligned on a 2^n
boundary.
 1.69  23-Feb-2003  pk Make updating a file's reference and use count MP-safe.
 1.68  20-Feb-2003  atatat Introduce "top down" memory management for mmap()ed allocations. This
means that the dynamic linker gets mapped in at the top of available
user virtual memory (typically just below the stack), shared libraries
get mapped downwards from that point, and calls to mmap() that don't
specify a preferred address will get mapped in below those.

This means that the heap and the mmap()ed allocations will grow
towards each other, allowing one or the other to grow larger than
before. Previously, the heap was limited to MAXDSIZ by the placement
of the dynamic linker (and the process's rlimits) and the space
available to mmap was hobbled by this reservation.

This is currently only enabled via an *option* for the i386 platform
(though other platforms are expected to follow). Add "options
USE_TOPDOWN_VM" to your kernel config file, rerun config, and rebuild
your kernel to take advantage of this.

Note that the pmap_prefer() interface has not yet been modified to
play nicely with this, so those platforms require a bit more work
(most notably the sparc) before they can use this new memory
arrangement.

This change also introduces a VM_DEFAULT_ADDRESS() macro that picks
the appropriate default address based on the size of the allocation or
the size of the process's text segment accordingly. Several drivers
and the SYSV SHM address assignment were changed to use this instead
of each one picking their own "default".
 1.67  18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.66  27-Sep-2002  mycroft #if 0 the call to uvm_map_checkprot() in sys_munmap() -- it's not documented,
and programs do not expect it. Also fixes memory leaks in dlopen()/dlclose().
 1.65  06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.64  31-May-2002  atatat "offest" -> "offset" in a comment
 1.63  22-Mar-2002  darrenr branches: 1.63.2; 1.63.4;
Return EFBIG from mmap() if we try to map too much data and in the fixed
address allocation, return EOVERFLOW to match with the non-fixed error.
 1.62  14-Dec-2001  chs in sys_mincore(), check the return value of uvm_vslock() to determine
if the vec pointer is valid rather than using uvm_useracc().
uvm_useracc() just tells you if the permissions of a user mapping allow
the desired access, not whether faulting on that mapping will succeed.
 1.61  25-Nov-2001  chs disallow mapping negative offsets for both regular files and block devices.
 1.60  10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.59  30-Oct-2001  thorpej uvm_map_protect(): Don't allow VM_PROT_EXECUTE to be set on entries
(either the current protection or the max protection) that reference
vnodes associated with a file system mounted with the NOEXEC option.

uvm_mmap(): Don't allow PROT_EXEC mappings to be established of vnodes
which are associated with a file system mounted with the NOEXEC option.
 1.58  30-Oct-2001  thorpej - Add a new vnode flag VEXECMAP, which indicates that a vnode has
executable mappings. Stop overloading VTEXT for this purpose (VTEXT
also has another meaning).
- Rename vn_marktext() to vn_markexec(), and use it when executable
mappings of a vnode are established.
- In places where we want to set VTEXT, set it in v_flag directly, rather
than making a function call to do this (it no longer makes sense to
use a function call, since we no longer overload VTEXT with VEXECMAP's
meaning).

VEXECMAP suggested by Chuq Silvers.
 1.57  29-Oct-2001  thorpej uvm_mmap(): If a vnode mapping is established with PROT_EXEC, mark the
vnode as VTEXT.

uvm_map_protect(): When VM_PROT_EXECUTE is added to a VA range, mark
all the vnodes mapped by the range as VTEXT.
 1.56  15-Sep-2001  chs branches: 1.56.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.55  17-Aug-2001  chs branches: 1.55.2;
call VOP_MMAP() before allowing mappings of vnodes to allow
filesystems which do not support memory mapped access to cause
mmap() of their vnodes to fail.
 1.54  14-Jun-2001  thorpej branches: 1.54.2;
Fix a partial construction problem that can cause race conditions
between creation of a file descriptor and close(2) when using kernel
assisted threads. What we do is stick descriptors in the table, but
mark them as "larval". This causes essentially everything to treat
it as a non-existent descriptor, except for fdalloc(), which sees a
filled slot so that it won't (incorrectly) allocate it again. When
a descriptor is fully constructed, the code that has constructed it
marks it as "mature" (which actually clears the "larval" flag), and
things continue to work as normal.

While here, gather all the code that gets a descriptor from the table
into a fd_getfile() function, and call it, rather than having the
same (sometimes incorrect) code copied all over the place.
 1.53  02-Jun-2001  chs replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.52  26-May-2001  chs replace vm_page_t with struct vm_page *.
 1.51  25-May-2001  chs remove trailing whitespace.
 1.50  15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.49  18-Feb-2001  chs branches: 1.49.2;
clean up DIAGNOSTIC checks, use KASSERT().
 1.48  08-Jan-2001  thorpej Nevermind that it's silly to include PROT_EXEC even if a vnode
doesn't have the exec bit set, we need to have PROT_EXEC set
in order for some expected mmap/mprotect behavior to work, so
do the last bit slightly differently: if udv_attach() fails, and
the protection (NOT maxprot) doens't include PROT_EXEC, then clear
PROT_EXEC from maxprot and try udv_attach() again.

Sigh, mmap really needs to be rototilled.
 1.47  07-Jan-2001  thorpej Only include PROT_EXEC in maxprot if the user specified PROT_EXEC
in the mmap() call. maxprot is used to create device mappings,
and always including PROT_EXEC causes the mapping to fail on the Alpha
when mapping a non-RAM offset of /dev/mem (which may be sparse, so
instruction fetch from there is disallowed).
 1.46  27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.45  24-Nov-2000  soren Typo in comment.
 1.44  13-Sep-2000  thorpej Add an align argument to uvm_map() and some callers of that
routine. Works similarly fto pmap_prefer(), but allows callers
to specify a minimum power-of-two alignment of the region.
How we ever got along without this for so long is beyond me.
 1.43  27-Jun-2000  mrg remove include of <vm/vm.h>
 1.42  26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.41  23-May-2000  enami branches: 1.41.4;
- Move the comment, which describes that calling the function
uvm_map_pageable(map, ...) implies unlocking passed map, just before the
function call.
- If we bail out before calling the uvm_map_pageable, unlock the map
by ourself to prevent a panic ``locking against myself''. The panic is,
for example, caused when cdrecord is invoked with too large fifo size.
 1.40  30-Mar-2000  augustss Remove more register declarations.
 1.39  28-Mar-2000  kleink In mmap(), bail out with EOVERFLOW when mapping a regular file and the file
offset plus mapping length cannot be represented in an off_t.
 1.38  26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.37  11-Dec-1999  thorpej Remove a piece of code introduced in rev 1.36 that I didn't intend to
commit.
 1.36  13-Nov-1999  thorpej Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.
 1.35  17-Jul-1999  thorpej branches: 1.35.2; 1.35.4; 1.35.8;
Add a set of "lockflags", which can control the locking behavior
of some functions. Use these flags in uvm_map_pageable() to determine
if the map is locked on entry (replaces an already present boolean_t
argument `islocked'), and if the function should return with the map
still locked.
 1.34  14-Jul-1999  thorpej Fix an operator precedence error which caused msync(2) to fail to pass
the PGO_CLEANIT flag to the object pagers. Fixes PR #7978, from
Matthias Pfaller.
 1.33  12-Jul-1999  kleink XSH5: change function signature to `void *sbrk(intptr_t)'.
 1.32  10-Jul-1999  thorpej Make a comment reflect reality.
 1.31  10-Jul-1999  thorpej Slightly better test for "object with no real pages". Test for NULL
pgo_releasepg rather than if the pager is the device pager.
 1.30  08-Jul-1999  thorpej Correct a comment.
 1.29  07-Jul-1999  thorpej Add some more meat to madvise(2):
* Implement MADV_DONTNEED: deactivate pages in the specified range,
semantics similar to Solaris's MADV_DONTNEED.
* Add MADV_FREE: free pages and swap resources associated with the
specified range, causing the range to be reloaded from backing
store (vnodes) or zero-fill (anonymous), semantics like FreeBSD's
MADV_FREE and like Digital UNIX's MADV_DONTNEED (isn't it SO GREAT
that madvise(2) isn't standardized!?)

As part of this, move the non-map-modifying advice handling out of
uvm_map_advise(), and into sys_madvise().

As another part, implement general amap cleaning in uvm_map_clean(), and
change uvm_map_clean() to only push dirty pages to disk if PGO_CLEANIT
is set in its flags (and update sys___msync13() accordingly). XXX Add
a patchable global "amap_clean_works", defaulting to 1, which can disable
the amap cleaning code, just in case problems are unearthed; this gives
a developer/user a quick way to recover and send a bug report (e.g. boot
into DDB and change the value).

XXX Still need to implement a real uao_flush().

XXX Need to update the manual page.

With these changes, rebuilding libc will automatically cause the new
malloc(3) to use MADV_FREE to actually release pages and swap resources
when it decides that can be done.
 1.28  06-Jul-1999  cgd from the comment added to the code:
> XXX (in)sanity check. We don't do proper datasize checking
> XXX for anonymous (or private writable) mmap(). However,
> XXX know that if we're trying to allocate more than the amount
> XXX remaining under our current data size limit, _that_ should
> XXX be disallowed.
This is one link on the chain of lossage known as PR#7897. It's
definitely not the right fix, but it's better than nothing.
 1.27  01-Jul-1999  thorpej Fix tyop. From Bill Studenmund.
 1.26  19-Jun-1999  thorpej Fix a typo.
 1.25  18-Jun-1999  thorpej Add the guts of mlockall(MCL_FUTURE). This requires that a process's
"memlock" resource limit to uvm_mmap(). Update all calls accordingly.
 1.24  17-Jun-1999  thorpej In sys_mmap():
- rather than treating MAP_COPY like MAP_PRIVATE by sheer virtue of it not
being MAP_SHARED, actually convert the MAP_COPY flag into MAP_PRIVATE.
- return EINVAL if MAP_SHARED and MAP_PRIVATE are both included in flags.
 1.23  16-Jun-1999  minoura Remove extra ].
 1.22  15-Jun-1999  thorpej Several changes, developed and tested concurrently:
* Provide POSIX 1003.1b mlockall(2) and munlockall(2) system calls.
MCL_CURRENT is presently implemented. MCL_FUTURE is not fully
implemented. Also, the same one-unlock-for-every-lock caveat
currently applies here as it does to mlock(2). This will be
addressed in a future commit.
* Provide the mincore(2) system call, with the same semantics as
Solaris.
* Clean up the error recovery in uvm_map_pageable().
* Fix a bug where a process would hang if attempting to mlock a
zero-fill region where none of the pages in that region are resident.
[ This fix has been submitted for inclusion in 1.4.1 ]
 1.21  23-May-1999  mrg implement madvice() for MADV_{NORMAL,RANDOM,SEQUENTIAL}, others are not yet done.
 1.20  03-May-1999  mrg fix some formatting foo.
 1.19  25-Mar-1999  mrg branches: 1.19.2; 1.19.4; 1.19.6;
remove now >1 year old pre-release message.
 1.18  24-Mar-1999  cgd modify udv_attach() and its caller (uvm_mmap()) so that it's passed the
offset and size of the requested region to be mapped, so that the
udv_attach() can use the device d_mmap() entry to check mappability
of the requested region.
 1.17  09-Mar-1999  kleink Have unimplemented/unsupported system calls (madvise(), mincore(), sbrk(),
sstk()) fail with ENOSYS.
 1.16  04-Mar-1999  chs fix printf format types.
 1.15  11-Oct-1998  chuck branches: 1.15.2;
remove unused share map code from UVM:
- update calls to uvm_unmap_remove/uvm_unmap (mainonly boolean arg
has been removed)
- replace UVM_ET_ISMAP checks with UVM_ET_ISSUBMAP checks
 1.14  30-Sep-1998  mrg back out previous.
 1.13  30-Sep-1998  tv Declare silent success on madvise(). As an advisory call, it is harmless
to pretend success even though it's not supported, and some emulations
rely on its success.
 1.12  13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.11  07-Jul-1998  thorpej branches: 1.11.2;
Add support for mmap'ing disk block devices.
 1.10  30-May-1998  kleink Per XSH98, const'ify the `addr' arguments to mlock() and munlock().
 1.9  10-May-1998  mrg reject attempts to map an immutable or append-only file, shared with
write protection. this stops data corruption where it was possible
to change the in-memory copy of an append-only file (but not the on-disk
copy). this is documented in NetBSD security advisory 1998-003. thanks
to darrenr, lukem, cgd, mycroft and mrg for this.
 1.8  01-Apr-1998  tv mmap() default MAP_SHARED/MAP_PRIVATE is ``DEBUG'', not ``DIAGNOSTIC''
 1.7  28-Mar-1998  kleink Per XPG, if the file descriptor argument to mmap() refers to a file whose
type is not supported (neither VREG nor VCHR, or not a vnode at all), fail
with ENODEV instead of EINVAL.
 1.6  09-Mar-1998  mrg KNF.
 1.5  03-Mar-1998  mycroft Convert MAP_PRIVATE device mappings to MAP_SHARED on *all* platforms, not just
the SPARC.
Remove the #ifdef COMPAT_13 for automatically adding a sharing type, since the
interface is *supposed* to support this.
Also modify the DIAGNOSTIC messages here a bit.
 1.4  10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.3  07-Feb-1998  mrg restore rcsids
 1.2  06-Feb-1998  thorpej RCS ID police.
 1.1  05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1  05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.11.2.1  30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.15.2.1  09-Nov-1998  chs initial snapshot. lots left to do.
 1.19.6.1  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.19.4.7  11-Aug-1999  chs add casts for trunc_page() and round_page() args.
 1.19.4.6  09-Aug-1999  chs create a new type "voff_t" for uvm_object offsets
and define it to be "off_t". also, remove pgo_asyncget().
 1.19.4.5  02-Aug-1999  thorpej Update from trunk.
 1.19.4.4  11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.19.4.3  01-Jul-1999  thorpej Sync w/ -current.
 1.19.4.2  21-Jun-1999  thorpej Sync w/ -current.
 1.19.4.1  07-Jun-1999  chs merge everything from chs-ubc branch.
 1.19.2.1  07-Jul-1999  perry pullup 1.27->1.28 (cgd)
 1.35.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.35.4.1  15-Nov-1999  fvdl Sync with -current
 1.35.2.5  27-Mar-2001  bouyer Sync with HEAD.
 1.35.2.4  12-Mar-2001  bouyer Sync with HEAD.
 1.35.2.3  18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.35.2.2  08-Dec-2000  bouyer Sync with HEAD.
 1.35.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.41.4.1  25-Jan-2001  jhawk Pull up revisions 1.47-1.48 via patch (requested by thorpej):
Change PROT_EXEC handling. Clear it from the maxprot if the protection
lacks it, after a failed udv_attach() and retry the udv_attach().
 1.49.2.16  18-Oct-2002  nathanw Catch up to -current.
 1.49.2.15  17-Sep-2002  nathanw Catch up to -current.
 1.49.2.14  16-Jul-2002  nathanw Whitespace.
 1.49.2.13  12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.49.2.12  24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.49.2.11  20-Jun-2002  nathanw Catch up to -current.
 1.49.2.10  29-May-2002  nathanw #include <sys/sa.h> before <sys/syscallargs.h>, to provide sa_upcall_t
now that <sys/param.h> doesn't include <sys/sa.h>.

(Behold the Power of Ed)
 1.49.2.9  01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.49.2.8  08-Jan-2002  nathanw Catch up to -current.
 1.49.2.7  14-Nov-2001  nathanw Catch up to -current.
 1.49.2.6  21-Sep-2001  nathanw Catch up to -current.
 1.49.2.5  24-Aug-2001  nathanw A few files and lwp/proc conversions I missed in the last big update.
GENERIC runs again.
 1.49.2.4  24-Aug-2001  nathanw Catch up with -current.
 1.49.2.3  21-Jun-2001  nathanw Catch up to -current.
 1.49.2.2  09-Apr-2001  nathanw Catch up with -current.
 1.49.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.54.2.4  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.54.2.3  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.54.2.2  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.54.2.1  25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.55.2.2  01-Oct-2001  fvdl Catch up with -current.
 1.55.2.1  07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.56.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.63.4.2  15-Mar-2004  jmc Pullup rev 1.66 (requested by skrll in ticket #1607)

#if 0 the call to uvm_map_checkprot() in sys_munmap() -- it's not documented,
and programs do not expect it. Also fixes memory leaks in dlopen()/dlclose().
 1.63.4.1  17-Aug-2003  tron Pull up revision 1.72 (requested by tv in ticket #1420):
PR/21948: Todd Vierling: Implement MAP_TRYFIXED for linux emulation.
 1.63.2.2  20-Jun-2002  gehenna catch up with -current.
 1.63.2.1  16-May-2002  gehenna Get rid of iszerodev. Use the 'zerodev' (dev_t for /dev/zero).
 1.74.2.10  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.74.2.9  01-Apr-2005  skrll Sync with HEAD.
 1.74.2.8  15-Feb-2005  skrll Sync with HEAD.
 1.74.2.7  24-Jan-2005  skrll Sync with HEAD.
 1.74.2.6  17-Jan-2005  skrll Sync with HEAD.
 1.74.2.5  18-Dec-2004  skrll Sync with HEAD.
 1.74.2.4  21-Sep-2004  skrll Fix the sync with head I botched.
 1.74.2.3  18-Sep-2004  skrll Sync with HEAD.
 1.74.2.2  03-Aug-2004  skrll Sync with HEAD
 1.74.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.86.2.1  29-Apr-2005  kent sync with -current
 1.87.2.3  26-Mar-2005  yamt sync with head.
 1.87.2.2  12-Feb-2005  yamt sync with head.
 1.87.2.1  25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.88.4.3  15-Oct-2005  riz Pull up following revision(s) (requested by chs in ticket #877):
sys/uvm/uvm_mmap.c: revision 1.93
stop converting async msync() to sync.
this hasn't been needed for years (if it ever was).
 1.88.4.2  18-Sep-2005  tron Pull up following revision(s) (requested by fvdl in ticket #798):
sys/compat/sunos/sunos_exec.c: revision 1.47
sys/compat/pecoff/pecoff_emul.c: revision 1.11
sys/arch/sparc64/sparc64/netbsd32_machdep.c: revision 1.45
sys/arch/amd64/amd64/netbsd32_machdep.c: revision 1.12
sys/sys/proc.h: revision 1.198
sys/compat/mach/mach_exec.c: revision 1.56
sys/compat/freebsd/freebsd_exec.c: revision 1.27
sys/arch/sparc64/include/vmparam.h: revision 1.27
sys/kern/kern_resource.c: revision 1.91
sys/compat/netbsd32/netbsd32_netbsd.c: revision 1.88
sys/compat/osf1/osf1_exec.c: revision 1.39
sys/compat/svr4_32/svr4_32_resource.c: revision 1.5
sys/compat/ultrix/ultrix_misc.c: revision 1.99
sys/compat/svr4_32/svr4_32_exec.h: revision 1.9
sys/kern/exec_elf32.c: revision 1.103
sys/compat/aoutm68k/aoutm68k_exec.c: revision 1.19
sys/compat/sunos32/sunos32_exec.c: revision 1.20
sys/compat/hpux/hpux_exec.c: revision 1.46
sys/compat/darwin/darwin_exec.c: revision 1.40
sys/kern/sysv_shm.c: revision 1.83
sys/uvm/uvm_extern.h: revision 1.99
sys/uvm/uvm_mmap.c: revision 1.89
sys/kern/kern_exec.c: revision 1.195
sys/compat/netbsd32/netbsd32.h: revision 1.31
sys/arch/sparc64/sparc64/svr4_32_machdep.c: revision 1.20
sys/compat/svr4/svr4_exec.c: revision 1.56
sys/compat/irix/irix_exec.c: revision 1.41
sys/compat/ibcs2/ibcs2_exec.c: revision 1.63
sys/compat/svr4_32/svr4_32_exec.c: revision 1.16
sys/arch/amd64/include/vmparam.h: revision 1.8
sys/compat/linux/common/linux_exec.c: revision 1.73
Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.
* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2
Tested on amd64, compile-tested on sparc64.
 1.88.4.1  24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.91.2.8  24-Mar-2008  yamt sync with head.
 1.91.2.7  21-Jan-2008  yamt sync with head
 1.91.2.6  07-Dec-2007  yamt sync with head
 1.91.2.5  27-Oct-2007  yamt sync with head.
 1.91.2.4  03-Sep-2007  yamt sync with head.
 1.91.2.3  26-Feb-2007  yamt sync with head.
 1.91.2.2  30-Dec-2006  yamt sync with head.
 1.91.2.1  21-Jun-2006  yamt sync with head.
 1.94.12.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.94.10.2  19-Apr-2006  elad sync with head.
 1.94.10.1  08-Mar-2006  elad Adapt to kernel authorization changes.
 1.94.8.3  11-Aug-2006  yamt sync with head
 1.94.8.2  24-May-2006  yamt sync with head.
 1.94.8.1  11-Apr-2006  yamt sync with head
 1.94.6.2  01-Jun-2006  kardel Sync with head.
 1.94.6.1  22-Apr-2006  simonb Sync with head.
 1.94.4.1  09-Sep-2006  rpaulo sync with head
 1.96.2.1  19-Jun-2006  chap Sync with head.
 1.98.6.2  10-Dec-2006  yamt sync with head.
 1.98.6.1  22-Oct-2006  yamt sync with head
 1.98.4.4  09-Feb-2007  ad Sync with HEAD.
 1.98.4.3  30-Jan-2007  ad Remove support for SA. Ok core@.
 1.98.4.2  12-Jan-2007  ad Sync with head.
 1.98.4.1  18-Nov-2006  ad Sync with head.
 1.102.2.1  10-Mar-2007  bouyer Pull up following revision(s) (requested by elad in ticket #407):
sys/kern/kern_verifiedexec.c: patch
sys/uvm/uvm_mmap.c: revision 1.104 via patch
If Veriexec prevents indirect execution of the binary, in addition to just
blocking the mmap() if exec bit is requested, also strip exec bit from
maxprot for further mprotect() calls. Okay joerg@.
 1.105.2.3  17-May-2007  yamt sync with head.
 1.105.2.2  12-Mar-2007  rmind Sync with HEAD.
 1.105.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.108.4.1  11-Jul-2007  mjf Sync with head.
 1.108.2.8  09-Oct-2007  ad Sync with head.
 1.108.2.7  09-Oct-2007  ad Sync with head.
 1.108.2.6  20-Aug-2007  ad Sync with HEAD.
 1.108.2.5  17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.108.2.4  08-Jun-2007  ad Sync with head.
 1.108.2.3  13-Apr-2007  ad - Fix a (new) bug where vget tries to acquire freed vnodes' interlocks.
- Minor locking fixes.
 1.108.2.2  21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.108.2.1  13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.112.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.114.8.2  27-Jul-2007  pooka Change unused fflags parameter in VOP_MMAP to prot and pass in
desired vm protection.
 1.114.8.1  27-Jul-2007  pooka file uvm_mmap.c was added on branch matt-mips64 on 2007-07-27 08:26:39 +0000
 1.114.6.2  09-Jan-2008  matt sync with HEAD
 1.114.6.1  06-Nov-2007  matt sync with HEAD
 1.114.4.3  27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.114.4.2  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.114.4.1  02-Oct-2007  joerg Sync with HEAD.
 1.115.2.1  14-Oct-2007  yamt sync with head.
 1.117.4.3  18-Feb-2008  mjf Sync with HEAD.
 1.117.4.2  27-Dec-2007  mjf Sync with HEAD.
 1.117.4.1  08-Dec-2007  mjf Sync with HEAD.
 1.118.6.1  02-Jan-2008  bouyer Sync with HEAD
 1.118.2.2  26-Dec-2007  ad Sync with head.
 1.118.2.1  04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.121.6.2  05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.121.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.122.6.3  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.122.6.2  14-May-2008  wrstuden Per discussion with ad, remove most of the #include <sys/sa.h> lines
as they were including sa.h just for the type(s) needed for syscallargs.h.

Instead, create a new file, sys/satypes.h, which contains just the
types needed for syscallargs.h. Yes, there's only one now, but that
may change and it's probably more likely to change if it'd be difficult
to handle. :-)

Per discussion with matt at n dot o, add an include of satypes.h to
sigtypes.h. Upcall handlers are kinda signal handlers, and signalling
is the header file that's already included for syscallargs.h that
closest matches SA.

This shaves about 3000 lines off of the diff of the branch relative
to the base. That also represents about 18% of the total before this
checkin.

I think this reduction is very good thing.
 1.122.6.1  10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.122.4.7  11-Aug-2010  yamt sync with head.
 1.122.4.6  11-Mar-2010  yamt sync with head
 1.122.4.5  19-Aug-2009  yamt sync with head.
 1.122.4.4  20-Jun-2009  yamt sync with head
 1.122.4.3  30-May-2009  yamt revert the previous, which has been committed to the wrong branch.
 1.122.4.2  30-May-2009  yamt wrap long lines.
 1.122.4.1  04-May-2009  yamt sync with head.
 1.122.2.2  17-Jun-2008  yamt sync with head.
 1.122.2.1  04-Jun-2008  yamt sync with head
 1.126.12.2  23-Jul-2009  jym Sync with HEAD.
 1.126.12.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.126.8.1  01-Apr-2009  snj Pull up following revision(s) (requested by mrg in ticket #622):
bin/csh/csh.1: revision 1.46
bin/csh/func.c: revision 1.37
bin/ps/print.c: revision 1.111
bin/ps/ps.c: revision 1.74
bin/sh/miscbltin.c: revision 1.38
bin/sh/sh.1: revision 1.92 via patch
external/bsd/top/dist/machine/m_netbsd.c: revision 1.7
lib/libkvm/kvm_proc.c: revision 1.82
sys/arch/mips/mips/cpu_exec.c: revision 1.55
sys/compat/darwin/darwin_exec.c: revision 1.57
sys/compat/ibcs2/ibcs2_exec.c: revision 1.73
sys/compat/irix/irix_resource.c: revision 1.15
sys/compat/linux/arch/amd64/linux_exec_machdep.c: revision 1.16
sys/compat/linux/arch/i386/linux_exec_machdep.c: revision 1.12
sys/compat/linux/common/linux_limit.h: revision 1.5
sys/compat/osf1/osf1_resource.c: revision 1.14
sys/compat/svr4/svr4_resource.c: revision 1.18
sys/compat/svr4_32/svr4_32_resource.c: revision 1.17
sys/kern/exec_subr.c: revision 1.62
sys/kern/init_sysctl.c: revision 1.160
sys/kern/kern_exec.c: revision 1.288
sys/kern/kern_resource.c: revision 1.151
sys/sys/param.h: patch
sys/sys/resource.h: revision 1.31
sys/sys/sysctl.h: revision 1.184
sys/uvm/uvm_extern.h: revision 1.153
sys/uvm/uvm_glue.c: revision 1.136
sys/uvm/uvm_mmap.c: revision 1.128
usr.bin/systat/ps.c: revision 1.32
- - add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.
- - adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.
- - add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)
- - patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)
- - patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.
- - update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)
this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.
tested on i386 and sparc64, build tested on several other platforms.
thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
 1.126.6.1  28-Apr-2009  skrll Sync with HEAD.
 1.132.4.4  31-May-2011  rmind sync with head
 1.132.4.3  05-Mar-2011  rmind sync with head
 1.132.4.2  03-Jul-2010  rmind sync with head
 1.132.4.1  16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.132.2.1  17-Aug-2010  uebayasi Sync with HEAD.
 1.133.4.1  08-Feb-2011  bouyer Sync with HEAD
 1.133.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.135.2.1  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.139.6.1  18-Feb-2012  mrg merge to -current.
 1.139.2.2  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.139.2.1  17-Apr-2012  yamt sync with head
 1.144.10.1  18-May-2014  rmind sync with head
 1.144.6.2  03-Dec-2017  jdolecek update from HEAD
 1.144.6.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.148.4.2  11-Jan-2015  snj Pull up following revision(s) (requested by chs in ticket #403):
sys/uvm/uvm_mmap.c: revision 1.151
in uvm_mmap_dev(), use the passed-in offset instead of 0.
from Onno van der Linden in PR 49536.
 1.148.4.1  31-Dec-2014  snj Pull up following revision(s) (requested by chs in ticket #363):
common/lib/libprop/prop_kern.c: revision 1.18
sys/arch/mac68k/dev/grf_compat.c: revision 1.27
sys/arch/x68k/dev/grf.c: revision 1.45
sys/external/bsd/drm/dist/bsd-core/drm_bufs.c: revision 1.12
sys/external/bsd/drm2/drm/drm_drv.c: revision 1.12
sys/external/bsd/drm2/drm/drm_vm.c: revision 1.6
sys/external/bsd/drm2/include/linux/mm.h: revision 1.4
sys/kern/vfs_vnops.c: revision 1.192 via patch
sys/rump/librump/rumpkern/vm.c: revision 1.160
sys/sys/file.h: revision 1.78 via patch
sys/uvm/uvm_device.c: revision 1.64
sys/uvm/uvm_device.h: revision 1.13
sys/uvm/uvm_extern.h: revision 1.192
sys/uvm/uvm_mmap.c: revision 1.150 via patch
add a new "fo_mmap" fileops method to allow use of arbitrary uvm_objects for
mappings of file objects. move vnode-specific details of mmap()ing a vnode
from uvm_mmap() to the new vnode-specific vn_mmap(). add new uvm_mmap_dev()
and uvm_mmap_anon() convenience functions for mapping character devices
and anonymous memory, and replace all other calls to uvm_mmap() with those.
use the new fileop in drm2 so that libdrm can use mmap() to map things
like on other platforms (instead of the ioctl that we have used so far).
 1.149.2.8  28-Aug-2017  skrll Sync with HEAD
 1.149.2.7  05-Oct-2016  skrll Sync with HEAD
 1.149.2.6  09-Jul-2016  skrll Sync with HEAD
 1.149.2.5  29-May-2016  skrll Sync with HEAD
 1.149.2.4  22-Apr-2016  skrll Sync with HEAD
 1.149.2.3  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.149.2.2  22-Sep-2015  skrll Sync with HEAD
 1.149.2.1  06-Apr-2015  skrll Sync with HEAD
 1.162.6.2  11-May-2017  pgoyette Sync with HEAD
 1.162.6.1  02-May-2017  pgoyette Sync with HEAD - tag prg-localcount2-base1
 1.166.2.3  01-Apr-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1815):

sys/uvm/uvm_mmap.c: revision 1.180

mmap(2): If we fail with a hint, try again without it.
`Hint' here means nonzero addr, but no MAP_FIXED or MAP_TRYFIXED.

This is suboptimal -- we could teach uvm_mmap to do a fancier search
using the address as a hint. But this should do for now.

Candidate fix for PR kern/55533.
 1.166.2.2  11-Aug-2019  martin Pull up following revision(s) (requested by maxv in ticket #1332):

sys/uvm/uvm_mmap.c: revision 1.173

Change 'npgs' from int to size_t. Otherwise the 64bit->32bit conversion
could lead to npgs=0, which is not expected. It later triggers a panic
in uvm_vsunlock().

Found by TriforceAFL (Akul Pillai).
 1.166.2.1  02-Nov-2017  snj Pull up following revision(s) (requested by christos in ticket #336):
sys/uvm/uvm_mmap.c: revision 1.167
[syzkaller] Fix for PR #52658 as suggested by riastradh@
The bug was found by Dmitry Vyukov (dvyukov%google.com@localhost)
using syzkaller and was tested by me on a VM running
8.99.5
 1.169.4.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.169.4.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.169.4.1  10-Jun-2019  christos Sync with HEAD
 1.172.4.2  01-Apr-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1621):

sys/uvm/uvm_mmap.c: revision 1.180

mmap(2): If we fail with a hint, try again without it.
`Hint' here means nonzero addr, but no MAP_FIXED or MAP_TRYFIXED.

This is suboptimal -- we could teach uvm_mmap to do a fancier search
using the address as a hint. But this should do for now.

Candidate fix for PR kern/55533.
 1.172.4.1  21-Oct-2019  martin Pull up following revision(s) (requested by maxv in ticket #355):

sys/uvm/uvm_mmap.c: revision 1.173

Change 'npgs' from int to size_t. Otherwise the 64bit->32bit conversion
could lead to npgs=0, which is not expected. It later triggers a panic
in uvm_vsunlock().
Found by TriforceAFL (Akul Pillai).
 1.174.2.1  29-Feb-2020  ad Sync with head.
 1.175.10.1  01-Aug-2021  thorpej Sync with HEAD.

RSS XML Feed