Home | History | Annotate | Download | only in uvm
History log of /src/sys/uvm/uvm_glue.c
RevisionDateAuthorComments
 1.182  04-Oct-2023  ad Remove unneeded test of ci->ci_want_resched.
 1.181  14-Jun-2020  ad Remove PG_ZERO. It worked brilliantly on x86 machines from the mid-90s but
having spent an age experimenting with it over the last 6 months on various
machines and with different use cases it's always either break-even or a
slight net loss for me.
 1.180  11-Jun-2020  ad uvm_availmem(): give it a boolean argument to specify whether a recent
cached value will do, or if the very latest total must be fetched. It can
be called thousands of times a second and fetching the totals impacts not
only the calling LWP but other CPUs doing unrelated activity in the VM
system.
 1.179  22-May-2020  ad Remove the ubc_direct hack.
 1.178  23-Apr-2020  ad Enable ubc_direct by default, but only on systems with no more than 2 CPUs
for now.
 1.177  05-Mar-2020  rin branches: 1.177.2;
Part of PR kern/54994:

Memory allocated in the fast path of uarea_poolpage_alloc() is
a page itself. Therefore, it is obviously page-aligned.

Pointed out by skrll.
 1.176  12-Jan-2020  ad l->l_emap_gen isn't used any more.
 1.175  31-Dec-2019  ad branches: 1.175.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.174  31-Dec-2019  ad Rename uvm_free() -> uvm_availmem().
 1.173  27-Dec-2019  ad Redo the page allocator to perform better, especially on multi-core and
multi-socket systems. Proposed on tech-kern. While here:

- add rudimentary NUMA support - needs more work.
- remove now unused "listq" from vm_page.
 1.172  21-Dec-2019  ad uvmexp.free -> uvm_free()
 1.171  16-Dec-2019  ad - Extend the per-CPU counters matt@ did to include all of the hot counters
in UVM, excluding uvmexp.free, which needs special treatment and will be
done with a separate commit. Cuts system time for a build by 20-25% on
a 48 CPU machine w/DIAGNOSTIC.

- Avoid 64-bit integer divide on every fault (for rnd_add_uint32).
 1.170  21-Nov-2019  ad Use lwp_changepri().
 1.169  14-Nov-2019  maxv Don't include "opt_kasan.h" when there's already <sys/asan.h> included.
 1.168  08-May-2019  chs uvm_pagealloc() uses UVM_PGA_* flags, not UVM_KMF_* flags,
and it is always nowait. fix uarea_poolpage_alloc() to not use
flags from the wrong collection for calling uvm_pagealloc()
and to wait itself if a page is not immediately available.
 1.167  07-Apr-2019  maxv Provide a code argument in kasan_mark(), and give a code to each caller.
Five codes used: GenericRedZone, MallocRedZone, KmemRedZone, PoolRedZone,
and PoolUseAfterFree.

This can greatly help debugging complex memory corruptions.
 1.166  23-Dec-2018  maxv Simplify the KASAN API, use only kasan_mark() and explain briefly. The
alloc/free naming was too confusing.
 1.165  04-Nov-2018  mlelstv PMAP_MAP_POOLPAGE must not fail. Trigger assertion here instead of
panic later from failing PR_WAITOK memory allocations.
 1.164  22-Aug-2018  maxv Add support for monitoring the stack with kASan. This allows us to detect
illegal memory accesses occuring there.

The compiler inlines a piece of code in each function that adds redzones
around the local variables and poisons them. The illegal accesses are then
detected using the usual kASan machinery.

The stack size is doubled, from 4 pages to 8 pages.

Several boot functions are marked with the __noasan flag, to prevent the
compiler from adding redzones in them (because we haven't yet initialized
kASan). The kasan_early_init function is called early at boot time to
quickly create the shadow for the current stack; after this is done, we
don't need __noasan anymore in the boot path.

We pass -fasan-shadow-offset=0xDFFF900000000000, because the compiler
wants to do
shad = shadow-offset + (addr >> 3)
and we do, in kasan_addr_to_shad
shad = KASAN_SHADOW_START + ((addr - CANONICAL_BASE) >> 3)
hence
shad = KASAN_SHADOW_START + (addr >> 3) - (CANONICAL_BASE >> 3)
= [KASAN_SHADOW_START - (CANONICAL_BASE >> 3)] + (addr >> 3)
implies
shadow-offset = KASAN_SHADOW_START - (CANONICAL_BASE >> 3)
= 0xFFFF800000000000 - (0xFFFF800000000000 >> 3)
= 0xDFFF900000000000

In UVM, we add a kasan_free (that is not preceded by a kasan_alloc). We
don't add poisoned redzones ourselves, but all the functions we execute
do, so we need to manually clear the poison before freeing the stack.

With the help of Kamil for the makefile stuff.
 1.163  22-May-2016  maxv branches: 1.163.16; 1.163.18;
Revert my previous change. I missed an entry on NXR.
 1.162  21-May-2016  maxv USPACE and USPACE_ALIGN are constants. Use a #if instead. Probably saves
some instructions.
 1.161  27-Nov-2014  uebayasi branches: 1.161.2;
Consistently use kpreempt_*() outside scheduler path.
 1.160  01-Sep-2012  matt branches: 1.160.2;
Add a __HAVE_CPU_UAREA_IDLELWP hook so that the MD code can allocate
special UAREAs for idle lwp's.
 1.159  08-Apr-2012  martin Rework posix_spawn locking and memory management:
- always provide a vmspace for the new proc, initially borrowing from proc0
(this part fixes PR 46286)
- increase parallelism between parent and child if arguments allow this,
avoiding a potential deadlock on exec_lock
- add a new flag for userland to request old (lockstepped) behaviour for
better error reporting
- adapt test cases to the previous two and add a new variant to test the
diagnostics flag
- fix a few memory (and lock) leaks
- provide netbsd32 compat
 1.158  06-Apr-2012  chs fix uarea_system_poolpage_free() to handle freeing a uarea
that was not allocated by cpu_uarea_alloc() (ie. on plaforms
where cpu_uarea_alloc() failing is not fatal).
fixes PR 46284.
 1.157  20-Feb-2012  martin Solve previous fix (for early posix_spawn children exiting on error)
differently.
 1.156  12-Feb-2012  martin branches: 1.156.2;
In uvm_proc_exit bail out early if we have no vmspace yet (as it happens
for failing posix_spawn child processes).
Fixes PR kern/45991.
 1.155  11-Feb-2012  martin Add a posix_spawn syscall, as discussed on tech-kern.
Based on the summer of code project by Charles Zhang, heavily reworked
later by me - all bugs are likely mine.
Ok: core, releng.
 1.154  01-Feb-2012  para allocate uareas and buffers from kernel_map again
add code to drain pools if kmem_arena runs out of space
 1.153  27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.152  23-Nov-2011  matt branches: 1.152.2;
When allocating a page for a kernel stack and PMAP_ALLOC_POOLPAGE is
defined, use it. (allows a MIPS N32 kernel to boot when there is memory
outside of KSEG0).
 1.151  02-Jul-2011  matt branches: 1.151.2;
Allow the MD code to decide to panic if cpu_uarea_alloc would return NULL.
If NULL is returned, just allocate the standard way.
 1.150  12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.149  18-Feb-2011  drochner branches: 1.149.2;
make this build w/o HAVE_CPU_UAREA_ROUTINES
 1.148  17-Feb-2011  matt Add support for cpu-specific uarea allocation routines. Allows different
allocation for user and system lwps. MIPS will use this to map uareas of
system lwp used direct-mapped addresses (to reduce the overhead of
switching to kernel threads). ibm4xx could use to map uareas via direct
mapped addresses and avoid the problem of having the kernel stack not in
the TLB.
 1.147  02-Feb-2011  chuck udpate license clauses on my code to match the new-style BSD licenses.
based on second diff that rmind@ sent me.

no functional change with this commit.
 1.146  14-Jan-2011  rmind branches: 1.146.2; 1.146.4;
Retire struct user, remove sys/user.h inclusions. Note sys/user.h header
as obsolete. Remove USER_TO_UAREA/UAREA_TO_USER macros.

Various #include fixes and review by matt@.
 1.145  16-Apr-2010  rmind - Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.
 1.144  25-Feb-2010  jym branches: 1.144.2;
Change RSS (resident set size) limit. Instead of setting it arbitrarily
to the total free memory available to the system, use the smallest value
between VM_MAXUSER_ADDRESS and total free memory (having a RSS limit
bigger than VM_MAXUSER_ADDRESS has no real meaning).

Fix a possible int overflow when ptoa(uvmexp.free) is bigger than 4GB
with a 32 bits vaddr_t.

Reviewed by bouyer@.

See also http://mail-index.netbsd.org/tech-kern/2010/02/24/msg007395.html
 1.143  17-Dec-2009  rmind branches: 1.143.2;
Replace few USER_TO_UAREA/UAREA_TO_USER uses, reduce sys/user.h inclusions.
 1.142  21-Nov-2009  rmind Add uvm_lwp_getuarea() and uvm_lwp_setuarea(). OK matt@.
 1.141  21-Oct-2009  rmind Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.140  10-Aug-2009  matt Revent change to printf. (why can't __func__ concat with other string?)
 1.139  09-Aug-2009  matt Only swapout uareas if VMSWAP_UAREA is defined (which is should be by default).
If it's not defined and PMAP_MAP_POOLPAGE is defined and USPACE == PAGE_SIZE,
then allocate/map USPACE via uvm_pagealloc/PMAP_MAP_POOLPAGE.

On platforms like MIPS with 16KB pages, this means that uareas (and hence lwp
kernel stacks) will be always be accessible since they will be KSEG0.
 1.138  28-Jun-2009  rmind Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
 1.137  16-Apr-2009  rmind Avoid few #ifdef KSTACK_CHECK_MAGIC.
 1.136  29-Mar-2009  mrg - add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
 1.135  31-Jan-2009  yamt branches: 1.135.2;
uvm_swapin: uncomment an assertion which is now ok.
 1.134  19-Nov-2008  ad Make the emulations, exec formats, coredump, NFS, and the NFS server
into modules. By and large this commit:

- shuffles header files and ifdefs
- splits code out where necessary to be modular
- adds module glue for each of the components
- adds/replaces hooks for things that can be installed at runtime
 1.133  25-Jun-2008  ad branches: 1.133.2; 1.133.4; 1.133.6;
Don't swap kernel stacks of realtime threads.
 1.132  16-Jun-2008  ad uvm_swapout: try to lock the vm_map before calling pmap_collect.
 1.131  09-Jun-2008  ad branches: 1.131.2;
swappable: invert previous so we check for SACTIVE or SSTOP.
 1.130  09-Jun-2008  ad swappable: return false if l->l_proc->p_stat == SDYING.
 1.129  09-Jun-2008  ad uvm_proc_exit: use macros to disable preemption.
 1.128  04-Jun-2008  ad - vm_page: put listq, pageq into a union alongside a LIST_ENTRY, so we can
use both types of list.

- Make page coloring and idle zero state per-CPU.

- Maintain per-CPU page freelists. When freeing, put pages onto the local
CPU's lists and the global lists. When allocating, prefer to take pages
from the local CPU. If none are available take from the global list as
done now. Proposed on tech-kern@.
 1.127  31-May-2008  ad PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.
 1.126  27-Apr-2008  ad branches: 1.126.2; 1.126.4;
Disable preemption while swapping pmap.
 1.125  24-Apr-2008  ad Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.124  11-Apr-2008  yamt branches: 1.124.2;
fix the order of printf arguments.
 1.123  11-Apr-2008  christos - use uarea_swapin, rather than duplicating the code.
- use __func__ where appropriate.
 1.122  29-Mar-2008  christos make this compile
 1.121  29-Mar-2008  dholland Fix broken build. hi skrll :-)
 1.120  29-Mar-2008  skrll Fix unsed variable when DEBUG isn't defined.
 1.119  27-Mar-2008  ad Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.118  29-Feb-2008  yamt update comment
 1.117  08-Feb-2008  yamt branches: 1.117.2; 1.117.6;
uvm_uarea_init: fix compilation where PAGE_SIZE is not a constant. (sparc)
reported by Tom Spindler.
 1.116  07-Feb-2008  yamt uvm_uarea_init: make #if about PR_NOALIGN clearer and add a comment
to explain it.
 1.115  28-Jan-2008  yamt remove a special allocator for uareas, which is no longer necessary.
use pool_cache instead.
 1.114  02-Jan-2008  ad Merge vmlocking2 to head.
 1.113  06-Nov-2007  ad branches: 1.113.2; 1.113.6;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.
 1.112  21-Sep-2007  ad branches: 1.112.4; 1.112.6;
uvm_swapin: disable the swaplock assertion. uvm_lwp_hold() can't take
the lock yet.
 1.111  18-Aug-2007  ad branches: 1.111.2;
Include sys/cpu.h for CPU_INFO_FOREACH.
 1.110  18-Aug-2007  ad Fix error in previous.
 1.109  18-Aug-2007  ad Make the uarea cache per-CPU and drain in batches of 4.
 1.108  14-Jul-2007  ad branches: 1.108.2; 1.108.6;
Revert unintentially committed change.
 1.107  09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.106  17-May-2007  yamt merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
 1.105  24-Mar-2007  rmind Export uvm_uarea_free() to the rest.
Make things compile again.
 1.104  04-Mar-2007  christos branches: 1.104.2; 1.104.4; 1.104.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.103  22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.102  21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.101  19-Feb-2007  ad uvm_kick_scheduler(): do nothing until the swap subsystem is initialized.
 1.100  17-Feb-2007  pavel Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.
 1.99  15-Feb-2007  ad branches: 1.99.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).
 1.98  09-Feb-2007  ad Merge newlock2 to head.
 1.97  05-Oct-2006  chs add support for O_DIRECT (I/O directly to application memory,
bypassing any kernel caching for file data).
 1.96  29-Aug-2006  matt branches: 1.96.2; 1.96.4;
Make PTRACE and COREDUMP optional. Make the default (status quo) by putting
them in conf/std.
 1.95  13-Jun-2006  yamt uvm_swapin: process -> lwp in a comment.
 1.94  22-May-2006  yamt introduce macros, UAREA_TO_USER and USER_TO_UAREA,
to convert uarea VA into a pointer to struct user and vice versa,
so that MD code can change the layout in uarea.
 1.93  15-Mar-2006  drochner branches: 1.93.2; 1.93.4;
-clean up the interface to uvm_fault: the "fault type" didn't serve
any purpose (done by a macro, so we don't save any cycles for now)
-kill vm_fault_t; it is not needed for real faults, and for simulated
faults (wiring) it can be replaced by UVM internal flags
-remove <uvm/uvm_fault.h> from uvm_extern.h again
 1.92  24-Dec-2005  perry branches: 1.92.4; 1.92.6; 1.92.8; 1.92.10;
__inline__ -> inline
 1.91  11-Dec-2005  christos merge ktrace-lwp.
 1.90  24-Oct-2005  chs remove the assertion in uvm_swapout_threads() about LSONPROC lwps
not running on the same CPU as the swapper. l_stat is protected by
sched_lock, which isn't held here, so we can race with that lwp
starting to run and see its l_cpu not updated yet, as in PR 31870.
we check l_stat again in uvm_swapout() while holding sched_lock,
so the race itself is harmless.
 1.89  27-Jun-2005  thorpej branches: 1.89.2; 1.89.4;
Use ANSI function decls.
 1.88  10-Jun-2005  matt Rework the coredump code to have no explicit knownledge of how coredump
i/o is done. Instead, pass an opaque cookie which is then passed to a
new routine, coredump_write, which does the actual i/o. This allows the
method of doing i/o to change without affecting any future MD code.
Also, make netbsd32_core.c [re]use core_netbsd.c (in a similar manner that
core_elf64.c uses core_elf32.c) and eliminate that code duplication.
cpu_coredump{,32} is now called twice, first with a NULL iocookie to fill
the core structure and a second to actually write md parts of the coredump.
All i/o is nolonger random access and is suitable for shipping over a stream.
 1.87  07-Jun-2005  matt Make sure state.end has a valid initial value.
 1.86  02-Jun-2005  matt When writing coredumps, don't write zero uninstantiated demand-zero pages.
Also, with ELF core dumps, trim trailing zeroes from sections. These two
changes can shrink coredumps by over 50% in size.
 1.85  06-May-2005  nathanw uvm_coredump_walkmap(): Set UVM_COREDUMP_NODUMP on regions whose
protection does not include VM_PROT_READ, so that the core dumping
doesn't error out with EFAULT when trying to write that region.

Addresses PR kern/30143; approach suggested by chs@.
 1.84  01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.83  08-Feb-2005  yamt branches: 1.83.4;
update a comment; malloc doesn't use uvm_kernacc anymore.
 1.82  21-Jan-2005  chs branches: 1.82.2;
reduce the size of user coredump files by not dumping regions of
the address space that have never been touched (such as much of the
virtual space allocated for pthread stacks).
 1.81  12-May-2004  yamt branches: 1.81.4;
add assertions.
 1.80  02-May-2004  pk Make uvm_uarea_free an inline function.
 1.79  04-Apr-2004  pk Use maxdmap and maxsmap instead of MAXDSIZ and MAXSSIZ.
 1.78  24-Mar-2004  junyoung branches: 1.78.2; 1.78.4; 1.78.6;
- Nuke __P().
- Drop trailing spaces.
 1.77  09-Feb-2004  yamt - borrow vmspace0 in uvm_proc_exit instead of uvmspace_free.
the latter is not a appropriate place to do so and it broke vfork.
- deactivate pmap before calling cpu_exit() to keep a balance of
pmap_activate/deactivate.
 1.76  16-Jan-2004  yamt uvm_coredump_walkmap: use UVM_OBJ_IS_DEVICE macro.
 1.75  04-Jan-2004  jdolecek Rearrange process exit path to avoid need to free resources from different
process context ('reaper').

From within the exiting process context:
* deactivate pmap and free vmspace while we can still block
* introduce MD cpu_lwp_free() - this cleans all MD-specific context (such
as FPU state), and is the last potentially blocking operation;
all of cpu_wait(), and most of cpu_exit(), is now folded into cpu_lwp_free()
* process is now immediatelly marked as zombie and made available for pickup
by parent; the remaining last lwp continues the exit as fully detached
* MI (rather than MD) code bumps uvmexp.swtch, cpu_exit() is now same
for both 'process' and 'lwp' exit

uvm_lwp_exit() is modified to never block; the u-area memory is now
always just linked to the list of available u-areas. Introduce (blocking)
uvm_uarea_drain(), which is called to release the excessive u-area memory;
this is called by parent within wait4(), or by pagedaemon on memory shortage.
uvm_uarea_free() is now private function within uvm_glue.c.

MD process/lwp exit code now always calls lwp_exit2() immediatelly after
switching away from the exiting lwp.

g/c now unneeded routines and variables, including the reaper kernel thread
 1.74  30-Dec-2003  pk Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
 1.73  13-Nov-2003  chs eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.
 1.72  03-Nov-2003  yamt revert rev.1.70 as it was not needed.
uvm_map_lookup_entry() should handle addresses out of the map.
 1.71  02-Nov-2003  jdolecek kill unneded SYSVSHM includes
use ANSI C function definition for uvm_lwp_exit()
 1.70  01-Nov-2003  yamt don't try to lookup addresses out of the map in uvm_coredump_walkmap().
 1.69  24-Oct-2003  cl simplify tests:
The case where l_stat == LSONPROC and l_cpu == curcpu cannot happen
because the pagedaemon is the LWP on curcpu and the pagedaemon is a
kernel thread and the code is only used by the pagedaemon.

See also updated patch in PR kern/23095, which I ment to checkin
originally.
 1.68  19-Oct-2003  cl don't uvm_swapout LWPs which are LSONPROC on another cpu.

uvm_swapout_threads will swapout LWPs which are running on another CPU:
- uvm_swapout_threads considers LWPs running on another CPU for swapout
if their l_swtime is high
- uvm_swapout_threads considers LWPs on the runqueue for swapout if their
l_swtime is high but these LWPs might be running by the time uvm_swapout
is called

symptoms of failure: panic in setrunqueue

fixes PR kern/23095
 1.67  13-Oct-2003  scw In uvm_lwp_fork(), check if PMAP_UAREA() is defined and if so, invoke it
with the KVA of the newly-wired uarea.

This is useful on some architectures (e.g. xscale) where the uarea mapping
can be tweaked to use the mini-data cache instead of the main cache.
 1.66  29-Jun-2003  fvdl branches: 1.66.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.65  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.64  14-Feb-2003  atatat Rework the way in which the map is traversed when dumping core. Now
we read-lock the map and call uvm_map_lookup_entry() instead of simply
walking from the header to the next and to the next, etc.

Dumping from sparsely populated amaps could cause faults that would
result in amaps being split, which (in turn) resulted in the core
dumping routines dumping some regions of memory twice. This makes the
core file too large, the headers not match, gdb not work properly,
and so on.

Addresses PR 19260.
 1.63  22-Jan-2003  yamt make KSTACK_CHECK_* compile after sa merge.
 1.62  18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.61  17-Nov-2002  chs change uvm_uarea_alloc() to indicate whether the returned uarea is already
backed by physical pages (ie. because it reused a previously-freed one),
so that we can skip a bunch of useless work in that case.
this fixes the underlying problem behind PR 18543, and also speeds up fork()
quite a bit (eg. 7% on my pc, 1% on my ultra2) when we get a cache hit.
 1.60  22-Sep-2002  chs encapsulate knowledge of uarea allocation in some new functions.
 1.59  02-Jul-2002  yamt add KSTACK_CHECK_MAGIC. discussed on tech-kern.
 1.58  15-May-2002  matt branches: 1.58.2;
When core dumping a process, don't dump maps backed up by the device pager.
(move the pagerops externs to uvm_object.h and out the C files).
 1.57  31-Dec-2001  chs introduce a new UVM fault type, VM_FAULT_WIREMAX. this is different
from VM_FAULT_WIRE in that when the pages being wired are faulted in,
the simulated fault is at the maximum protection allowed for the mapping
instead of the current protection. use this in uvm_map_pageable{,_all}()
to fix the problem where writing via ptrace() to shared libraries that
are also mapped with wired mappings in another process causes a
diagnostic panic when the wired mapping is removed.

this is a really obscure problem so it deserves some more explanation.
ptrace() writing to another process ends up down in uvm_map_extract(),
which for MAP_PRIVATE mappings (such as shared libraries) will cause
the amap to be copied or created. then the amap is made shared
(ie. the AMAP_SHARED flag is set) between the kernel and the ptrace()d
process so that the kernel can modify pages in the amap and have the
ptrace()d process see the changes. then when the page being modified
is actually faulted on, the object pages (from the shared library vnode)
is copied to a new anon page and inserted into the shared amap.
to make all the processes sharing the amap actually see the new anon
page instead of the vnode page that was there before, we need to
invalidate all the pmap-level mappings of the vnode page in the pmaps
of the processes sharing the amap, but we don't have a good way of
doing this. the amap doesn't keep track of the vm_maps which map it.
so all we can do at this point is to remove all the mappings of the
page with pmap_page_protect(), but this has the unfortunate side-effect
of removing wired mappings as well. removing wired mappings with
pmap_page_protect() is a legitimate operation, it can happen when a file
with a wired mapping is truncated. so the pmap has no way of knowing
whether a request to remove a wired mapping is normal or when it's due to
this weird situation. so the pmap has to remove the weird mapping.
the process being ptrace()d goes away and life continues. then,
much later when we go to unwire or remove the wired vm_map mapping,
we discover that the pmap mapping has been removed when it should
still be there, and we panic.

so where did we go wrong? the problem is that we don't have any way
to update just the pmap mappings that need to be updated in this
scenario. we could invent a mechanism to do this, but that is much
more complicated than this change and it doesn't seem like the right
way to go in the long run either.

the real underlying problem here is that wired pmap mappings just
aren't a good concept. one of the original properties of the pmap
design was supposed to be that all the information in the pmap could
be thrown away at any time and the VM system could regenerate it all
through fault processing, but wired pmap mappings don't allow that.
a better design for UVM would not require wired pmap mappings,
and Chuck C. and I are talking about this, but it won't be done
anytime soon, so this change will do for now.

this change has the effect of causing MAP_PRIVATE mappings to be
copied to anonymous memory when they are mlock()d, so that uvm_fault()
doesn't need to copy these pages later when called from ptrace(), thus
avoiding the call to pmap_page_protect() and the panic that results
from this when the mlock()d region is unlocked or freed. note that
this change doesn't help the case where the wired mapping is MAP_SHARED.

discussed at great length with Chuck Cranor.
fixes PRs 10363, 12554, 12604, 13041, 13487, 14580 and 14853.
 1.56  10-Dec-2001  thorpej Move the code that walks the process's VM map during a coredump
into uvm_coredump_walkmap(), and use callbacks into the coredump
routine to do something with each section.
 1.55  10-Nov-2001  lukem add RCSIDs, and in some cases, slightly cleanup #include order
 1.54  06-Nov-2001  chs in uvm_exit(), don't bother to unwire the uarea before we free it,
the pages will be freed anyway.
 1.53  23-Sep-2001  chs branches: 1.53.2;
bump the rusage counter for "swaps" when we swap out a process.
addresses PR 6170.
 1.52  15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.51  10-Sep-2001  chris Update pmap_update to now take the updated pmap as an argument.
This will allow improvements to the pmaps so that they can more easily defer expensive operations, eg tlb/cache flush, til the last possible moment.

Currently this is a no-op on most platforms, so they should see no difference.

Reviewed by Jason.
 1.50  02-Jun-2001  chs branches: 1.50.2; 1.50.4;
replace vm_map{,_entry}_t with struct vm_map{,_entry} *.
 1.49  30-May-2001  lukem add missing #include "opt_kgdb.h"
 1.48  25-May-2001  chs remove trailing whitespace.
 1.47  24-Apr-2001  thorpej Sprinkle pmap_update() calls after calls to:
- pmap_enter()
- pmap_remove()
- pmap_protect()
- pmap_kenter_pa()
- pmap_kremove()
as described in pmap(9).

These calls are relatively conservative. It may be possible to
optimize these a little more.
 1.46  21-Apr-2001  thorpej The pmap_update() call at the end of uvm_swapout_threads() is
completely useless. Nuke it.
 1.45  15-Mar-2001  chs eliminate the KERN_* error codes in favor of the traditional E* codes.
the mapping is:

KERN_SUCCESS 0
KERN_INVALID_ADDRESS EFAULT
KERN_PROTECTION_FAILURE EACCES
KERN_NO_SPACE ENOMEM
KERN_INVALID_ARGUMENT EINVAL
KERN_FAILURE various, mostly turn into KASSERTs
KERN_RESOURCE_SHORTAGE ENOMEM
KERN_NOT_RECEIVER <unused>
KERN_NO_ACCESS <unused>
KERN_PAGES_LOCKED <unused>
 1.44  06-Feb-2001  eeh branches: 1.44.2;
Move maxdmap and maxsmap where they belong and make them big enough.
 1.43  25-Nov-2000  chs lots of cleanup:
use queue.h macros and KASSERT().
address amap offsets in pages instead of bytes.
make amap_ref() and amap_unref() take an amap, offset and length
instead of a vm_map_entry_t.
improve whitespace and comments.
 1.42  11-Oct-2000  thorpej - uvmspace_share(): If p2 has a vmspace already, make sure to deactivate
it and free it as appropriate. Activate p2's new address space once
it references p1's.
- uvm_fork(): Make sure the child's vmspace is NULL before calling
uvmspace_share() (the child doens't have one already in this case).

These changes do not change the behavior for the current use of
uvmspace_share() (vfork(2)), but make it possible for an already
running process (such as a kernel thread) to properly attach to
another process's address space.
 1.41  23-Sep-2000  enami splstatclock is insufficient to protect run queues. Acquire scheduler
lock instead.
 1.40  21-Aug-2000  thorpej Remove a totally unnecessary splhigh/spl0 pair.
 1.39  12-Aug-2000  sommerfeld add comment warning about possible unlock/sleep race
 1.38  27-Jun-2000  mrg remove include of <vm/vm.h>
 1.37  26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.36  18-Jun-2000  simonb Set p->p_addr to NULL after it gets freed.
 1.35  08-Jun-2000  thorpej Change UVM_UNLOCK_AND_WAIT() to use ltsleep() (it is now atomic, as
advertised). Garbage-collect uvm_sleep().
 1.34  28-May-2000  thorpej Rather than starting init and creating kthreads by forking and then
doing a cpu_set_kpc(), just pass the entry point and argument all
the way down the fork path starting with fork1(). In order to
avoid special-casing the normal fork in every cpu_fork(), MI code
passes down child_return() and the child process pointer explicitly.

This fixes a race condition on multiprocessor systems; a CPU could
grab the newly created processes (which has been placed on a run queue)
before cpu_set_kpc() would be performed.
 1.33  26-May-2000  thorpej branches: 1.33.2;
Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.
 1.32  30-Mar-2000  augustss Remove more register declarations.
 1.31  26-Mar-2000  kleink Merge parts of chs-ubc2 into the trunk:
Add a new type voff_t (defined as a synonym for off_t) to describe offsets
into uvm objects, and update the appropriate interfaces to use it, the
most visible effect being the ability to mmap() file offsets beyond
the range of a vaddr_t.

Originally by Chuck Silvers; blame me for problems caused by merging this
into non-UBC.
 1.30  13-Nov-1999  thorpej Change the pmap_enter() API slightly; pmap_enter() now returns an error
value (KERN_SUCCESS or KERN_RESOURCE_SHORTAGE) indicating if it succeeded
or failed. Change the `wired' and `access_type' arguments to a single
`flags' argument, which includes the access type, and flags:

PMAP_WIRED the old `wired' boolean
PMAP_CANFAIL pmap_enter() is allowed to fail

If PMAP_CANFAIL is not specified, the pmap should behave as it always
has in the face of a drastic resource shortage: fall over dead.

Change the fault handler to deal with failure (which indicates resource
shortage) by unlocking everything, waiting for the pagedaemon to free
more memory, then retrying the fault.
 1.29  25-Jul-1999  thorpej branches: 1.29.2; 1.29.4; 1.29.8;
Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.
 1.28  22-Jul-1999  thorpej Garbage collect thread_sleep()/thread_wakeup() left over from the old
Mach VM code. Also nuke iprintf(), which was no longer used anywhere.

Add proclist locking where appropriate.
 1.27  08-Jul-1999  thorpej Change the pmap_extract() interface to:
boolean_t pmap_extract(pmap_t, vaddr_t, paddr_t *);
This makes it possible for the pmap to map physical address 0.
 1.26  17-Jun-1999  thorpej Make uvm_vslock() return the error code from uvm_fault_wire(). All places
which use uvm_vslock() should now test the return value. If it's not
KERN_SUCCESS, wiring the pages failed, so the operation which is using
uvm_vslock() should error out.

XXX We currently just EFAULT a failed uvm_vslock(). We may want to do
more about translating error codes in the future.
 1.25  17-Jun-1999  thorpej In uvm_useracc(), make sure we have a read lock on the map before
calling uvm_map_checkprot().
 1.24  17-Jun-1999  thorpej The i386 and pc532 pmaps are officially fixed.
 1.23  28-May-1999  thorpej Make uvm_fault_unwire() take a vm_map_t, rather than a pmap_t, for
consistency. Use this opportunity for checking for intrsafe map use
in this routine (which is illegal).
 1.22  26-May-1999  thorpej Pass an access_type to uvm_vslock().
 1.21  26-May-1999  thorpej - uvm_fork()/uvm_swapin(): pass VM_PROT_READ|VM_PROT_WRITE access_type
to uvm_fault_wire(), to guarantee that the kernel stacks will not
cause even a mod/ref emulation fault.
- uvm_vslock(): pass VM_PROT_NONE until this function is updated.
 1.20  13-May-1999  thorpej Allow the caller to specify a stack for the child process. If NULL,
the child inherits the stack pointer from the parent (traditional
behavior). Like the signal stack, the stack area is secified as
a low address and a size; machine-dependent code accounts for stack
direction.

This is required for clone(2).
 1.19  30-Apr-1999  thorpej Pull signal actions out of struct user, make them a separate proc
substructure, and allow them to be shared.

Required for clone(2).
 1.18  26-Mar-1999  mycroft branches: 1.18.4;
Add a new `access type' argument to pmap_enter(). This indicates what type of
memory access a mapping was caused by. This is passed through from uvm_fault()
and udv_fault(), and in most other cases is 0.
The pmap module may use this to preset R/M information. On MMUs which require
R/M emulation, the implementation may preset the bits and avoid taking another
fault. On MMUs which keep R/M information in hardware, the implementation may
preset its cached bits to speed up the next call to pmap_is_modified() or
pmap_is_referenced().
 1.17  25-Mar-1999  mrg remove now >1 year old pre-release message.
 1.16  15-Mar-1999  chs remove a debugging printf.
 1.15  19-Oct-1998  tron Defopt SYSVMSG, SYSVSEM and SYSVSHM.
 1.14  08-Sep-1998  thorpej Implement uvm_exit(), which frees VM resources when a process finishes
exiting.
 1.13  13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.12  09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.11  09-May-1998  kleink branches: 1.11.2;
Use size_t to pass the length of the memory region to operate on to chgkprot(),
kernacc(), useracc(), vslock() and vsunlock(); (unsigned) ints are not
adequate on all platforms.
 1.10  08-May-1998  kleink Make uvm_vsunlock() actually use the proc * passed to it; per discussion
with Jason Thorpe.
 1.9  30-Apr-1998  thorpej Pass vslock() and vsunlock() a proc *, rather than implicitly operating
on curproc.
 1.8  09-Apr-1998  thorpej Oops, fix a typo.
 1.7  09-Apr-1998  thorpej Allocate kernel virtual address space for the U-area before allocating
the new proc structure when performing a fork. This makes it much
easier to abort a fork operation and return an error if we run out
of KVA space.

The U-area pages are still wired down in {,u}vm_fork(), as before.
 1.6  09-Mar-1998  mrg KNF.
 1.5  10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.4  07-Feb-1998  mrg restore rcsids
 1.3  07-Feb-1998  chs add locking of kernel_map in uvm_kernacc().
check return value of uvm_fault_wire() in uvm_fork().
enable swappings.
 1.2  06-Feb-1998  thorpej RCS ID police.
 1.1  05-Feb-1998  mrg branches: 1.1.1;
Initial revision
 1.1.1.1  05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the UVM kernel code portion.


this will be KNF'd shortly. :-)
 1.11.2.1  30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.18.4.3  11-Aug-1999  chs add casts for trunc_page() and round_page() args.
 1.18.4.2  02-Aug-1999  thorpej Update from trunk.
 1.18.4.1  21-Jun-1999  thorpej Sync w/ -current.
 1.29.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.29.4.1  15-Nov-1999  fvdl Sync with -current
 1.29.2.5  23-Apr-2001  bouyer Sync with HEAD.
 1.29.2.4  27-Mar-2001  bouyer Sync with HEAD.
 1.29.2.3  11-Feb-2001  bouyer Sync with HEAD.
 1.29.2.2  08-Dec-2000  bouyer Sync with HEAD.
 1.29.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.33.2.1  22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.44.2.22  15-Dec-2002  thorpej Add a mutex to the uarea cache.
 1.44.2.21  15-Dec-2002  thorpej Fix a comment.
 1.44.2.20  11-Dec-2002  thorpej Sync with HEAD.
 1.44.2.19  18-Oct-2002  nathanw L_INMEM, not P_INMEM.
 1.44.2.18  18-Oct-2002  nathanw Catch up to -current.
 1.44.2.17  01-Aug-2002  nathanw Catch up to -current.
 1.44.2.16  16-Jul-2002  nathanw Revert to curproc (in a comment).
 1.44.2.15  12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.44.2.14  24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.44.2.13  20-Jun-2002  nathanw Catch up to -current.
 1.44.2.12  28-Feb-2002  nathanw Add some LWP-specific swapout debugging.
 1.44.2.11  08-Jan-2002  nathanw Catch up to -current.
 1.44.2.10  16-Dec-2001  gmcgarry call cpu_proc_fork() from uvm_proc_fork()
 1.44.2.9  08-Dec-2001  thorpej cpu_fork() -> cpu_lwp_fork(). This logically forks an LWP, not a
complete process. As noted by Gregory McGarry on tech-kern.
 1.44.2.8  14-Nov-2001  nathanw Catch up to -current.
 1.44.2.7  26-Sep-2001  nathanw Catch up to -current.
Again.
 1.44.2.6  21-Sep-2001  nathanw Catch up to -current.
 1.44.2.5  03-Jul-2001  nathanw Correct merge lossage; lose the now-extraneous splstatclock().
 1.44.2.4  21-Jun-2001  nathanw Catch up to -current.
 1.44.2.3  09-Apr-2001  nathanw Catch up with -current.
 1.44.2.2  19-Mar-2001  nathanw Fix a very stupid and annoying bug: Don't try to uvm_fault_unwire() a
LWP's u-area twice.

Thirty lashes with a wet noodle for this one.
 1.44.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.50.4.1  01-Oct-2001  fvdl Catch up with -current.
 1.50.2.5  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.50.2.4  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.50.2.3  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.50.2.2  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.50.2.1  13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.53.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.58.2.1  15-Jul-2002  gehenna catch up with -current.
 1.66.2.8  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.66.2.7  01-Apr-2005  skrll Sync with HEAD.
 1.66.2.6  09-Feb-2005  skrll Sync with HEAD.
 1.66.2.5  24-Jan-2005  skrll Sync with HEAD.
 1.66.2.4  21-Sep-2004  skrll Fix the sync with head I botched.
 1.66.2.3  18-Sep-2004  skrll Sync with HEAD.
 1.66.2.2  03-Aug-2004  skrll Sync with HEAD
 1.66.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.78.6.1  06-Nov-2005  riz Pull up following revision(s) (requested by bouyer in ticket #5965):
sys/uvm/uvm_glue.c: revision 1.90
remove the assertion in uvm_swapout_threads() about LSONPROC lwps
not running on the same CPU as the swapper. l_stat is protected by
sched_lock, which isn't held here, so we can race with that lwp
starting to run and see its l_cpu not updated yet, as in PR 31870.
we check l_stat again in uvm_swapout() while holding sched_lock,
so the race itself is harmless.
 1.78.4.1  06-Nov-2005  riz Pull up following revision(s) (requested by bouyer in ticket #5965):
sys/uvm/uvm_glue.c: revision 1.90
remove the assertion in uvm_swapout_threads() about LSONPROC lwps
not running on the same CPU as the swapper. l_stat is protected by
sched_lock, which isn't held here, so we can race with that lwp
starting to run and see its l_cpu not updated yet, as in PR 31870.
we check l_stat again in uvm_swapout() while holding sched_lock,
so the race itself is harmless.
 1.78.2.1  06-Nov-2005  riz Pull up following revision(s) (requested by bouyer in ticket #5965):
sys/uvm/uvm_glue.c: revision 1.90
remove the assertion in uvm_swapout_threads() about LSONPROC lwps
not running on the same CPU as the swapper. l_stat is protected by
sched_lock, which isn't held here, so we can race with that lwp
starting to run and see its l_cpu not updated yet, as in PR 31870.
we check l_stat again in uvm_swapout() while holding sched_lock,
so the race itself is harmless.
 1.81.4.1  29-Apr-2005  kent sync with -current
 1.82.2.2  12-Feb-2005  yamt sync with head.
 1.82.2.1  25-Jan-2005  yamt - don't use uvm_object or managed mappings for wired allocations.
(eg. malloc(9))
- simplify uvm_km_* apis.
 1.83.4.3  06-Dec-2005  riz Apply patch (requested by yamt in ticket #1015):
sys/uvm/uvm_glue.c: patch
sys/uvm/uvm_km.c: patch
- correct a return value of uvm_km_valloc1 in the case of failure.
- do waitok allocation for uvm_uarea_alloc so that it won't fail on
temporary memory shortage.
 1.83.4.2  28-Oct-2005  jmc Pullup rev 1.90 (requested by chs in ticket #914)
Remove the assertion in uvm_swapout_threads() about LSONPROC lwps
not running on the same CPU as the swapper. l_stat is protected by
sched_lock, which isn't held here, so we can race with that lwp
starting to run and see its l_cpu not updated yet, as in PR 31870.
we check l_stat again in uvm_swapout() while holding sched_lock,
so the race itself is harmless.
 1.83.4.1  22-May-2005  snj Pull up revision 1.85 (requested by nathanw in ticket #322):
uvm_coredump_walkmap(): Set UVM_COREDUMP_NODUMP on regions whose
protection does not include VM_PROT_READ, so that the core dumping
doesn't error out with EFAULT when trying to write that region.
Addresses PR kern/30143; approach suggested by chs@.
 1.89.4.1  26-Oct-2005  yamt sync with head
 1.89.2.10  17-Mar-2008  yamt sync with head.
 1.89.2.9  11-Feb-2008  yamt sync with head.
 1.89.2.8  04-Feb-2008  yamt sync with head.
 1.89.2.7  21-Jan-2008  yamt sync with head
 1.89.2.6  15-Nov-2007  yamt sync with head.
 1.89.2.5  27-Oct-2007  yamt sync with head.
 1.89.2.4  03-Sep-2007  yamt sync with head.
 1.89.2.3  26-Feb-2007  yamt sync with head.
 1.89.2.2  30-Dec-2006  yamt sync with head.
 1.89.2.1  21-Jun-2006  yamt sync with head.
 1.92.10.1  19-Apr-2006  elad oops - *really* sync to head this time.
 1.92.8.4  03-Sep-2006  yamt sync with head.
 1.92.8.3  26-Jun-2006  yamt sync with head.
 1.92.8.2  24-May-2006  yamt sync with head.
 1.92.8.1  01-Apr-2006  yamt sync with head.
 1.92.6.2  01-Jun-2006  kardel Sync with head.
 1.92.6.1  22-Apr-2006  simonb Sync with head.
 1.92.4.1  09-Sep-2006  rpaulo sync with head
 1.93.4.1  19-Jun-2006  chap Sync with head.
 1.93.2.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.96.4.1  22-Oct-2006  yamt sync with head
 1.96.2.6  29-Dec-2006  ad Checkpoint work in progress.
 1.96.2.5  18-Nov-2006  ad Sync with head.
 1.96.2.4  17-Nov-2006  ad Checkpoint work in progress.
 1.96.2.3  24-Oct-2006  ad - Redo LWP locking slightly and fix some races.
- Fix some locking botches.
- Make signal mask / stack per-proc for SA processes.
- Add _lwp_kill().
 1.96.2.2  21-Oct-2006  ad Checkpoint work in progress on locking and per-LWP signals. Very much a
a work in progress and there is still a lot to do.
 1.96.2.1  11-Sep-2006  ad - Allocate and free turnstiles where needed.
- Split proclist_mutex and alllwp_mutex out of the proclist_lock,
and use in interrupt context.
- Fix an MP race in enterpgrp()/setsid().
- Acquire proclist_lock and p_crmutex in some obvious places.
 1.99.2.6  19-Apr-2007  ad Don't swap out threads blocked on a turnstile, to avoid deadlock.
 1.99.2.5  15-Apr-2007  yamt sync with head.
 1.99.2.4  17-Mar-2007  rmind Do not do an implicit enqueue in sched_switch(), move enqueueing back to
the dispatcher. Rename sched_switch() back to sched_nextlwp(). Add for
sched_enqueue() new argument, which indicates the calling from mi_switch().

Requested by yamt@
 1.99.2.3  12-Mar-2007  rmind Sync with HEAD.
 1.99.2.2  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.99.2.1  20-Feb-2007  rmind General Common Scheduler Framework (CSF) patch import. Huge thanks for
Daniel Sieger <dsieger at TechFak.Uni-Bielefeld de> for this work.

Short abstract: Split the dispatcher from the scheduler in order to
make the scheduler more modular. Introduce initial API for other
schedulers' implementations.

Discussed in tech-kern@
OK: yamt@, ad@

Note: further work will go soon.
 1.104.6.1  29-Mar-2007  reinoud Pullup to -current
 1.104.4.1  11-Jul-2007  mjf Sync with head.
 1.104.2.13  05-Nov-2007  ad uvm_scheduler: set curlwp->l_class = SCHED_FIFO so that the swapper does
not get its priority adjusted by the scheduler. This is a special case
since init inherits via fork() and we can only adjust the swapper after.
 1.104.2.12  01-Nov-2007  ad - Fix interactivity problems under high load. Beacuse soft interrupts
are being stacked on top of regular LWPs, more often than not aston()
was being called on a soft interrupt thread instead of a user thread,
meaning that preemption was not happening on EOI.

- Don't use bool in a couple of data structures. Sub-word writes are not
always atomic and may clobber other fields in the containing word.

- For SCHED_4BSD, make p_estcpu per thread (l_estcpu). Rework how the
dynamic priority level is calculated - it's much better behaved now.

- Kill the l_usrpri/l_priority split now that priorities are no longer
directly assigned by tsleep(). There are three fields describing LWP
priority:

l_priority: Dynamic priority calculated by the scheduler.
This does not change for kernel/realtime threads,
and always stays within the correct band. Eg for
timeshared LWPs it never moves out of the user
priority range. This is basically what l_usrpri
was before.

l_inheritedprio: Lent to the LWP due to priority inheritance
(turnstiles).

l_kpriority: A boolean value set true the first time an LWP
sleeps within the kernel. This indicates that the LWP
should get a priority boost as compensation for blocking.
lwp_eprio() now does the equivalent of sched_kpri() if
the flag is set. The flag is cleared in userret().

- Keep track of scheduling class (OTHER, FIFO, RR) in struct lwp, and use
this to make decisions in a few places where we previously tested for a
kernel thread.

- Partially fix itimers and usr/sys/intr time accounting in the presence
of software interrupts.

- Use kthread_create() to create idle LWPs. Move priority definitions
from the various modules into sys/param.h.

- newlwp -> lwp_create
 1.104.2.11  27-Oct-2007  yamt fix priorities for some kernel threads. advised and ok'ed by Andrew Doran.
 1.104.2.10  18-Oct-2007  ad Free uareas back to the uarea cache on the CPU where they were last used.
 1.104.2.9  09-Oct-2007  ad Sync with head.
 1.104.2.8  20-Aug-2007  ad Sync with HEAD.
 1.104.2.7  08-Jun-2007  ad Sync with head.
 1.104.2.6  10-Apr-2007  ad Don't swap out threads blocked on a turnstile, to avoid deadlock.
It doesn't make a lot of sense, anyhow.
 1.104.2.5  10-Apr-2007  ad Sync with head.
 1.104.2.4  09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.104.2.3  05-Apr-2007  ad - Put a per-LWP lock around swapin / swapout.
- Replace use of lockmgr().
- Minor locking fixes and assertions.
- uvm_map.h no longer pulls in proc.h, etc.
- Use kpause where appropriate.
 1.104.2.2  21-Mar-2007  ad GC the simplelock/spinlock debugging stuff.
 1.104.2.1  13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.108.6.3  06-Nov-2007  joerg Sync with HEAD.
 1.108.6.2  02-Oct-2007  joerg Sync with HEAD.
 1.108.6.1  03-Sep-2007  jmcneill Sync with HEAD.
 1.108.2.1  03-Sep-2007  skrll Sync with HEAD.
 1.111.2.3  23-Mar-2008  matt sync with HEAD
 1.111.2.2  09-Jan-2008  matt sync with HEAD
 1.111.2.1  06-Nov-2007  matt sync with HEAD
 1.112.6.2  18-Feb-2008  mjf Sync with HEAD.
 1.112.6.1  19-Nov-2007  mjf Sync with HEAD.
 1.112.4.1  13-Nov-2007  bouyer Sync with HEAD
 1.113.6.1  02-Jan-2008  bouyer Sync with HEAD
 1.113.2.2  15-Dec-2007  ad uvm_lwp_hold, uvm_lwp_rele: use atomic ops to avoid lock order problems.
 1.113.2.1  04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.117.6.5  17-Jan-2009  mjf Sync with HEAD.
 1.117.6.4  29-Jun-2008  mjf Sync with HEAD.
 1.117.6.3  05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.117.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.117.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.117.2.1  24-Mar-2008  keiichi sync with head.
 1.124.2.3  17-Jun-2008  yamt sync with head.
 1.124.2.2  04-Jun-2008  yamt sync with head
 1.124.2.1  18-May-2008  yamt sync with head.
 1.126.4.2  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.126.4.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.126.2.5  11-Aug-2010  yamt sync with head.
 1.126.2.4  11-Mar-2010  yamt sync with head
 1.126.2.3  19-Aug-2009  yamt sync with head.
 1.126.2.2  18-Jul-2009  yamt sync with head.
 1.126.2.1  04-May-2009  yamt sync with head.
 1.131.2.2  27-Jun-2008  simonb Sync with head.
 1.131.2.1  18-Jun-2008  simonb Sync with head.
 1.133.6.1  01-Apr-2009  snj Pull up following revision(s) (requested by mrg in ticket #622):
bin/csh/csh.1: revision 1.46
bin/csh/func.c: revision 1.37
bin/ps/print.c: revision 1.111
bin/ps/ps.c: revision 1.74
bin/sh/miscbltin.c: revision 1.38
bin/sh/sh.1: revision 1.92 via patch
external/bsd/top/dist/machine/m_netbsd.c: revision 1.7
lib/libkvm/kvm_proc.c: revision 1.82
sys/arch/mips/mips/cpu_exec.c: revision 1.55
sys/compat/darwin/darwin_exec.c: revision 1.57
sys/compat/ibcs2/ibcs2_exec.c: revision 1.73
sys/compat/irix/irix_resource.c: revision 1.15
sys/compat/linux/arch/amd64/linux_exec_machdep.c: revision 1.16
sys/compat/linux/arch/i386/linux_exec_machdep.c: revision 1.12
sys/compat/linux/common/linux_limit.h: revision 1.5
sys/compat/osf1/osf1_resource.c: revision 1.14
sys/compat/svr4/svr4_resource.c: revision 1.18
sys/compat/svr4_32/svr4_32_resource.c: revision 1.17
sys/kern/exec_subr.c: revision 1.62
sys/kern/init_sysctl.c: revision 1.160
sys/kern/kern_exec.c: revision 1.288
sys/kern/kern_resource.c: revision 1.151
sys/sys/param.h: patch
sys/sys/resource.h: revision 1.31
sys/sys/sysctl.h: revision 1.184
sys/uvm/uvm_extern.h: revision 1.153
sys/uvm/uvm_glue.c: revision 1.136
sys/uvm/uvm_mmap.c: revision 1.128
usr.bin/systat/ps.c: revision 1.32
- - add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.
- - adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.
- - add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)
- - patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)
- - patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.
- - update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)
this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.
tested on i386 and sparc64, build tested on several other platforms.
thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
 1.133.4.3  28-Apr-2009  skrll Sync with HEAD.
 1.133.4.2  03-Mar-2009  skrll Sync with HEAD.
 1.133.4.1  19-Jan-2009  skrll Sync with HEAD.
 1.133.2.1  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.135.2.2  23-Jul-2009  jym Sync with HEAD.
 1.135.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.143.2.1  30-Apr-2010  uebayasi Sync with HEAD.
 1.144.2.4  05-Mar-2011  rmind sync with head
 1.144.2.3  30-May-2010  rmind sync with head
 1.144.2.2  25-Apr-2010  rmind - Invent mm_md_getva() and mm_md_relva() routines, provided by MD and
indicated with __HAVE_MM_MD_PREFER_VA. It will be used to deal with
cache aliasing issues and thus fix little MIPS, ARM and friends.

- Convert dev_mem_readwrite() to use unmanaged mappings. Fix a missed
offset addition in a case of direct map. Sprinkle various comments in
the memory device driver.

- Add missing direct map handling on hp700 and vax. Make checks across
m68k ports more consistent, reduce the diffs. Fix kernacc check miss
on news68k. Minor off-by-one fix for alpha. Add MEMC_PHYS_BASE for
mmap() case check on acorn26. Misc clean-up.
 1.144.2.1  18-Mar-2010  rmind Unify /dev/{mem,kmem,zero,null} implementations in MI code. Based on patch
from Joerg Sonnenberger, proposed on tech-kern@, in February 2008.

Work and depression still in progress.
 1.146.4.2  05-Mar-2011  bouyer Sync with HEAD
 1.146.4.1  08-Feb-2011  bouyer Sync with HEAD
 1.146.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.149.2.1  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.151.2.2  30-Oct-2012  yamt sync with head
 1.151.2.1  17-Apr-2012  yamt sync with head
 1.152.2.3  29-Apr-2012  mrg sync to latest -current.
 1.152.2.2  24-Feb-2012  mrg sync to -current.
 1.152.2.1  18-Feb-2012  mrg merge to -current.
 1.156.2.3  12-Apr-2012  riz branches: 1.156.2.3.2;
Pull up following revision(s) (requested by martin in ticket #175):
sys/kern/kern_exit.c: revision 1.238
tests/lib/libc/gen/posix_spawn/t_fileactions.c: revision 1.4
tests/lib/libc/gen/posix_spawn/t_fileactions.c: revision 1.5
sys/uvm/uvm_extern.h: revision 1.183
lib/libc/gen/posix_spawn_fileactions.c: revision 1.2
sys/kern/kern_exec.c: revision 1.348
sys/kern/kern_exec.c: revision 1.349
sys/compat/netbsd32/syscalls.master: revision 1.95
sys/uvm/uvm_glue.c: revision 1.159
sys/uvm/uvm_map.c: revision 1.317
sys/compat/netbsd32/netbsd32.h: revision 1.95
sys/kern/exec_elf.c: revision 1.38
sys/sys/spawn.h: revision 1.2
sys/sys/exec.h: revision 1.135
sys/compat/netbsd32/netbsd32_execve.c: revision 1.34
Rework posix_spawn locking and memory management:
- always provide a vmspace for the new proc, initially borrowing from proc0
(this part fixes PR 46286)
- increase parallelism between parent and child if arguments allow this,
avoiding a potential deadlock on exec_lock
- add a new flag for userland to request old (lockstepped) behaviour for
better error reporting
- adapt test cases to the previous two and add a new variant to test the
diagnostics flag
- fix a few memory (and lock) leaks
- provide netbsd32 compat
Fix asynchronous posix_spawn child exit status (and test for it).
 1.156.2.2  09-Apr-2012  riz Pull up following revision(s) (requested by chs in ticket #167):
sys/uvm/uvm_glue.c: revision 1.158
fix uarea_system_poolpage_free() to handle freeing a uarea
that was not allocated by cpu_uarea_alloc() (ie. on plaforms
where cpu_uarea_alloc() failing is not fatal).
fixes PR 46284.
 1.156.2.1  20-Feb-2012  sborrill Pull up the following revisions(s) (requested by martin in ticket #14):
include/spawn.h: revision 1.2
sys/kern/kern_exec.c: revision 1.341
sys/uvm/uvm_glue.c: revision 1.157
tests/lib/libc/gen/posix_spawn/t_fileactions.c: revision 1.3

posix_spawn: fix kernel bug when passing empty fileactions (PR kern/46038)
and add a test case for this. Fix potential race condition, doublefreeing
of memory and memory leaks in error cases.
 1.156.2.3.2.1  28-Nov-2012  matt Pull from HEAD:
Add a __HAVE_CPU_UAREA_IDLELWP hook so that the MD code can allocate
special UAREAs for idle lwp's.
 1.160.2.1  03-Dec-2017  jdolecek update from HEAD
 1.161.2.1  29-May-2016  skrll Sync with HEAD
 1.163.18.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.163.18.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.163.18.1  10-Jun-2019  christos Sync with HEAD
 1.163.16.3  26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.163.16.2  26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.163.16.1  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.175.2.1  17-Jan-2020  ad Sync with head.
 1.177.2.1  25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)

RSS XML Feed