Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/kern_resource.c
RevisionDateAuthorComments
 1.195  04-Oct-2023  ad Eliminate l->l_ncsw and l->l_nivcsw. From memory think they were added
before we had per-LWP struct rusage; the same is now tracked there.
 1.194  23-Sep-2023  ad Repply this change with a couple of bugs fixed:

- Do away with separate pool_cache for some kernel objects that have no special
requirements and use the general purpose allocator instead. On one of my
test systems this makes for a small (~1%) but repeatable reduction in system
time during builds presumably because it decreases the kernel's cache /
memory bandwidth footprint a little.
- vfs_lockf: cache a pointer to the uidinfo and put mutex in the data segment.
 1.193  12-Sep-2023  ad Back out recent change to replace pool_cache with then general allocator.
Will return to this when I have time again.
 1.192  10-Sep-2023  ad - Do away with separate pool_cache for some kernel objects that have no special
requirements and use the general purpose allocator instead. On one of my
test systems this makes for a small (~1%) but repeatable reduction in system
time during builds presumably because it decreases the kernel's cache /
memory bandwidth footprint a little.
- vfs_lockf: cache a pointer to the uidinfo and put mutex in the data segment.
 1.191  08-Jul-2023  riastradh clock_gettime(2): Fix CLOCK_PROCESS/THREAD_CPUTIME_ID.

Use same calculation as getrusage, not some ad-hoc arithmetic of
internal scheduler parameters that are periodically rewound.

PR kern/57512

XXX pullup-8
XXX pullup-9
XXX pullup-10
 1.190  08-Jul-2023  riastradh kern_resource.c: Fix brace placement.

No functional change intended.
 1.189  09-Apr-2022  riastradh sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.
 1.188  12-Mar-2022  riastradh sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.
 1.187  23-May-2020  ad Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.186  21-Feb-2020  joerg Explicitly cast pointers to uintptr_t before casting to enums. They are
not necessarily the same size. Don't cast pointers to bool, check for
NULL instead.
 1.185  15-Feb-2020  ad - Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.
 1.184  08-Jan-2020  ad Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.
 1.183  21-Nov-2019  ad branches: 1.183.2;
calcru: ignore running softints, unless softint_timing is on.
Fixes crazy times reported for proc0.
 1.182  05-Apr-2019  mlelstv branches: 1.182.4;
avoid underflow in user/system time.
 1.181  13-May-2018  christos branches: 1.181.2;
correct the function name.
 1.180  09-May-2018  kre Cause a process's user and system times to become non-decreasing.

This alters the invented values (ie: statistically calculated)
that are returned - for small values, the values are likely going to
be different than they were, but that's largely nonsense anyway
(except that the sum of utime & stime does equal cpu time consumed
by the process). Once the values get large enough to be meaningful
the difference made by this change will be in the noise, and irrelevant.

This needs a couple of additions to struct proc, so we are now into 8.99.17
 1.179  08-May-2018  christos get the maxrss from the vmspace field, and handle platforms that don't
have pmap statistics here.
 1.178  07-May-2018  christos Load the struct rusage text, data, and stack fields from the vmspace struct.
Before they were all 0. We update them when we call getrusage() or on
process exit() so that the children rusage is accounted for.
 1.177  08-Apr-2018  mlelstv limits are bytes, vm sizes are clicks.
 1.176  24-Mar-2017  pgoyette branches: 1.176.12;
Add new sysctl variable proc.curproc.paxflags so a process can determine
which flags were set for it. Define some values for the variable:

CTL_PROC_PAXFLAGS_{ASLR,MPROTECT,GUARD}
 1.175  13-Jul-2016  njoly branches: 1.175.2; 1.175.4;
In dosetrlimit() round stack hard limit just like soft one.
Avoid cases where hard limit becomes smaller than soft limit.
 1.174  18-Oct-2014  snj branches: 1.174.2;
src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.173  25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.172  07-Jan-2013  chs branches: 1.172.2;
fix setrlimit(RLIMIT_STACK) for __MACHINE_STACK_GROWS_UP platforms.
 1.171  21-Dec-2012  njoly One semi-column is enough.
 1.170  03-Nov-2012  njoly Move rusage computation to a new getrusage1() function. Adjust all
compat/emulations to make use of it.
 1.169  09-Jun-2012  christos branches: 1.169.2;
Add a new resource to limit the number of lwps per user, RLIMIT_NTHR. There
is a global sysctl kern.maxlwp to control this, which is by default 2048.
The first lwp of each process or kernel threads are not counted against the
limit. To show the current resource usage per user, I added a new sysctl
that dumps the uidinfo structure fields.
 1.168  02-Dec-2011  yamt assertion
 1.167  03-Jun-2011  rmind branches: 1.167.2;
Revert maxdmap/maxsmap constification, as it causes problems on some
sparc models. Reported by tsutsui@.
 1.166  31-May-2011  rmind branches: 1.166.2;
sysctl_proc_corename: perform KAUTH_PROCESS_CORENAME check (for set case)
after the new name is copied into cnbuf. Spotted by enami@.
 1.165  24-May-2011  mrg fix proc.pid.corename:
- "oldp is not NULL" means the get case
- "newp is not NULL" means the set case
which may both happen at the same time.
 1.164  14-May-2011  rmind - Sprinkle __read_mostly, consitify maxdmap and maxsmap.
- Prevent sys/resourcevar.h from inclusion in userland.
- sys_{set,get}priority: use id_t for 'who', not int.
- Make donice() routine static.
- Remove trailing spaces, KNF.
 1.163  14-May-2011  rmind Improve/fix comments, give more meaningful names for variables.
 1.162  01-May-2011  christos if donice fails, don't keep going with the next process.
 1.161  01-May-2011  rmind - Remove FORK_SHARELIMIT and PL_SHAREMOD, simplify lim_privatise().
- Use kmem(9) for struct plimit::pl_corename.
 1.160  01-May-2011  rmind Merge duplicate code fragments into a new lim_setcorename() routine.
 1.159  01-May-2011  rmind Rename limfree() to lim_free(), misc clean up. No functional change.
 1.158  30-Apr-2011  rmind sysctl_proc_corename: improve comments, clean up, move a check for
KAUTH_REQ_PROCESS_CORENAME_SET earlier, do not bother to strcmp().
 1.157  01-Jul-2010  rmind branches: 1.157.2;
Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.
 1.156  26-May-2010  pooka Feed dust to a few linkset uses and explicitly call the constructor.
 1.155  03-Mar-2010  yamt branches: 1.155.2;
remove redundant checks of PK_MARKER.
 1.154  02-Oct-2009  elad branches: 1.154.2;
Stick nice policy in its own subsystem and call the listener "resource"
rather than "rlimit"...
 1.153  02-Oct-2009  elad Move rlimit policy back to the subsystem.

For this we needed proc_uidmatch() exposed, which makes a lot of sense,
so put it back in sys_process.c for use in other places as well.
 1.152  26-May-2009  elad PR/41489: Stathis Kamperis: etpriority(2) returns EACCES instead of EPERM

Per discussion on the PR's audit trail, put back original checks for now.
 1.151  29-Mar-2009  mrg - add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.

- adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.

- add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)

- patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)

- patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.

- update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)


this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.

tested on i386 and sparc64, build tested on several other platforms.

thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
 1.150  09-Feb-2009  rmind dosetrlimit: remove the checks which are no longer needed since rlim_t
is unsigned again. Hi <christos>!
 1.149  29-Jan-2009  drochner branches: 1.149.2;
put back a range check in setrlimit() for now
(thanks to Andrew Doran for remembering)
rlim_t _should_ be unsigned, but this needs more work
 1.148  11-Jan-2009  christos merge christos-time_t
 1.147  11-Oct-2008  pooka branches: 1.147.2; 1.147.4;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.
 1.146  11-Oct-2008  pooka Put ui_lock back and use it to modify the socket buffer size.
Typecasting quad_t * to long * and using atomic_add_long can't
possibly be expected to work!

Another fine error caught by the gcc type-punning warning. That
really really should be on by default in the kernel.
 1.145  30-Sep-2008  njoly Small fix to make setpriority(2) with PRIO_PROCESS return ESRCH when
no valid process can be found.
 1.144  29-Sep-2008  njoly Make setpriority(2) return EINVAL for incorrect which values.
 1.143  23-Jun-2008  rmind branches: 1.143.2;
sysctl_proc_stop: fix a lock-leak when kauth returns an error.
From <kefren>.
 1.142  31-May-2008  ad branches: 1.142.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.
 1.141  05-May-2008  ad branches: 1.141.2;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.140  29-Apr-2008  ad Ignore processes with PK_MARKER set.
 1.139  24-Apr-2008  ad branches: 1.139.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
 1.138  24-Apr-2008  ad Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.137  27-Mar-2008  ad branches: 1.137.2; 1.137.4;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.136  18-Mar-2008  ad uid_find:

- Issue membar_producer() before inserting the new uidinfo.
- Optimize slightly and fix a couple of KNF nits.
- Need sys/atomic.h.
 1.135  17-Mar-2008  rmind - Replace uihashtbl_lock and struct uidinfo::ui_lock with atomic operations.
This make uid_find(), chgproccnt(), chgsbsize() and lf_alloc(), lf_free()
functions lock-less.
- Increase the size of uihashtbl in case of MP system, as suggested by <ad>.
- Add HASH_SLIST type for hashinit().

Reviewed by <ad>.
 1.134  17-Mar-2008  rmind - Initialize uihashtbl in resource_init();
- Make some variables static, remove the externs from header;
- Wrap few long lines, misc;

No functional changes are intended.
 1.133  24-Feb-2008  christos Don't return 0 if the count is not changed in chgproccnt()!
 1.132  29-Jan-2008  yamt branches: 1.132.2; 1.132.6;
uid_find: use kmem_alloc rather than malloc.
 1.131  23-Jan-2008  elad Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.
 1.130  26-Dec-2007  ad Merge more changes from vmlocking2, mainly:

- Locking improvements.
- Use pool_cache for more items.
 1.129  22-Dec-2007  yamt use binuptime for l_stime/l_rtime.
 1.128  20-Dec-2007  dsl Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
 1.127  05-Dec-2007  ad branches: 1.127.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.
 1.126  29-Nov-2007  ad branches: 1.126.2;
Fix DIAGNOSTIC build.
 1.125  29-Nov-2007  ad Use atomics to adjust lim->pl_refcnt.
 1.124  06-Nov-2007  ad Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.
 1.123  08-Oct-2007  ad branches: 1.123.2; 1.123.4;
Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.
 1.122  29-Sep-2007  dsl Change the way p->p_limit (and hence p->p_rlimit) is locked.
Should fix PR/36939 and make the rlimit code MP safe.
Posted for comment to tech-kern (non received!)

The p_limit field (for a process) is only be changed once (on the first
write), and a reference to the old structure is kept (for code paths
that have cached the pointer).
Only p->p_limit is now locked by p->p_mutex, and since the referenced memory
will not go away, is only needed if the pointer is to be changed.
The contents of 'struct plimit' are all locked by pl_mutex, except that the
code doesn't bother to acquire it for reads (which are basically atomic).
Add FORK_SHARELIMIT that causes fork1() to share the limits between parent
and child, use it for the IRIX_PR_SULIMIT.
Fix borked test for both IRIX_PR_SUMASK and IRIX_PR_SDIR being set.
 1.121  21-Sep-2007  dsl branches: 1.121.2;
Rename members of 'struct plimit' so that the fields are 'pl_xxx' and
no longer have the same names as members of 'struct proc'.
 1.120  06-Sep-2007  rmind uid_find: Destroy mutex before free.
From CID: 4555
 1.119  08-Aug-2007  ad branches: 1.119.2;
Grab locks in getrusage/getrlimit.
 1.118  09-Jul-2007  ad branches: 1.118.2; 1.118.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.117  17-May-2007  yamt merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
 1.116  09-Mar-2007  ad branches: 1.116.2; 1.116.4;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.
 1.115  04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.114  22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.113  09-Feb-2007  ad branches: 1.113.2;
Merge newlock2 to head.
 1.112  20-Jan-2007  elad Kill KAUTH_PROCESS_RESOURCE and just replace it with two actions for
nice and rlimit.
 1.111  14-Dec-2006  elad - moves 'nice' access semantics to secmodel code,
- makes sysctl_proc_find() just lookup the process,
- use KAUTH_PROCESS_CANSEE requests to determine if the caller is
allowed to view the target process' corename, stop flags, and
rlimits,
- use explicit kauth(9) calls with KAUTH_PROCESS_CORENAME,
KAUTH_REQ_PROCESS_RESOURCE_NICE, KAUTH_REQ_PROCESS_RESOURCE_RLIMIT,
and KAUTH_PROCESS_STOPFLAG when modifying the aforementioned.
- sync man-page and example skeleton secmodel with reality.

okay yamt@

this is a pullup candidate.
 1.110  07-Dec-2006  ad sysctl_proc_corename(): do the second auth check against the correct
process.
 1.109  05-Dec-2006  elad PR/35021: Brian de Alwis: root cannot get/set rlimit information of user
processes through sysctl

Fix inverted logic in boolean assignment. This is why these tests should
not be done outside the secmodel code.

Thanks for the report.
 1.108  01-Nov-2006  yamt branches: 1.108.2;
remove some __unused from function parameters.
 1.107  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.106  10-Oct-2006  elad Use KAUTH_PROCESS_CORENAME instead of checking securelevel.
 1.105  13-Sep-2006  elad branches: 1.105.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.
 1.104  08-Sep-2006  elad First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)
 1.103  30-Jul-2006  elad branches: 1.103.4;
ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.
 1.102  23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.101  14-May-2006  elad integrate kauth.
 1.100  04-Feb-2006  yamt branches: 1.100.2; 1.100.4; 1.100.6;
for some random places, use PNBUF_GET/PUT rather than
- on-stack buffer
- malloc(MAXPATHLEN)
 1.99  11-Dec-2005  christos branches: 1.99.2; 1.99.4; 1.99.6;
merge ktrace-lwp.
 1.98  23-Jun-2005  thorpej branches: 1.98.2;
Use ANSI function decls. Apply some static.
 1.97  29-May-2005  christos - add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.
 1.96  09-May-2005  christos lock all uses of uidhash. provide macros to lock and unlock. based on more
discussions with yamt.
 1.95  09-May-2005  christos Protect chgsbsize() with splsoftnet(). As discussed with yamt.
 1.94  07-May-2005  christos PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.
 1.93  29-Mar-2005  christos Re-enable chgsbsize. It should work now.
 1.92  29-Mar-2005  he Properly disable the bulk of chgsbsize(), completing revision 1.84.
This does an #if 0 / #endif, so that no code (or declarations!) are
left after the first "return 1", making this compilable for vax and
playsation2 again, both of which use gcc 2.95.3 or similar.
 1.91  26-Mar-2005  fvdl Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.

* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2

Tested on amd64, compile-tested on sparc64.
 1.90  23-Mar-2005  christos Avoid a possible race during the time we give up our lock in order to
allocate memory. (From yamt)
 1.89  23-Mar-2005  christos Don't call malloc with a simple_lock held. Thanks to Greg Oster for pointing
my stupid mistake.
 1.88  20-Mar-2005  christos It does not make sense to free the uidinfo struct since it is used now
for multiple things (proccnt,lockcnt,sbsize) and it adds too much code
complexity. Instead add a uid_find() routine that returns the existing
struct or allocates a new one.

Re-enable the sbsize limit code.
 1.87  26-Feb-2005  perry branches: 1.87.2;
nuke trailing whitespace
 1.86  01-Oct-2004  yamt branches: 1.86.4; 1.86.6;
introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.
 1.85  13-May-2004  kleink KNF previous.
 1.84  13-May-2004  christos Disable chgsbsize. It is not MPSAFE
 1.83  06-May-2004  pk Provide a mutex for the process limits data structure.
 1.82  01-May-2004  matt Commons are not allowed in header files. extern them and declare them in
the appropriate .c file.
 1.81  25-Apr-2004  kleink POSIX-2001: Change the `who' argument to [gs]etpriority(2) from int
to id_t. Partially addressing PR standards/25216 from Murray Armfield.
 1.80  23-Apr-2004  yamt chgsbsize: correct limit check and ui_sbsize calculation.
ok'ed by Christos Zoulas.
 1.79  17-Apr-2004  christos PR/9347: Eric E. Fair: socket buffer pool exhaustion leads to system deadlock
and unkillable processes.
1. Introduce new SBSIZE resource limit from FreeBSD to limit socket buffer
size resource.
2. make sokvareserve interruptible, so processes ltsleeping on it can be
killed.
 1.78  08-Apr-2004  atatat Lots of sysctl descriptions (if someone wants to help out here, that
would be good) mostly copied from sysctl(3). This takes care of the
top-level, most of kern.* and hw.* (modulo the ath and bge stuff), and
all of proc.*.

If you don't want the added rodata in your kernel, use "options
SYSCTL_NO_DESCR" in your kernel config.
 1.77  04-Apr-2004  pk We use maxdmap and maxsmap, so remove comment questioning that.
 1.76  24-Mar-2004  atatat branches: 1.76.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.75  06-Dec-2003  atatat Don't need those any more
 1.74  04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.73  24-Aug-2003  chs add support for non-executable mappings (where the hardware allows this)
and make the stack and heap non-executable by default. the changes
fall into two basic catagories:

- pmap and trap-handler changes. these are all MD:
= alpha: we already track per-page execute permission with the (software)
PG_EXEC bit, so just have the trap handler pay attention to it.
= i386: use a new GDT segment for %cs for processes that have no
executable mappings above a certain threshold (currently the
bottom of the stack). track per-page execute permission with
the last unused PTE bit.
= powerpc/ibm4xx: just use the hardware exec bit.
= powerpc/oea: we already track per-page exec bits, but the hardware only
implements non-exec mappings at the segment level. so track the
number of executable mappings in each segment and turn on the no-exec
segment bit iff the count is 0. adjust the trap handler to deal.
= sparc (sun4m): fix our use of the hardware protection bits.
fix the trap handler to recognize text faults.
= sparc64: split the existing unified TSB into data and instruction TSBs,
and only load TTEs into the appropriate TSB(s) for the permissions.
fix the trap handler to check for execute permission.
= not yet implemented: amd64, hppa, sh5

- changes in all the emulations that put a signal trampoline on the stack.
instead, we now put the trampoline into a uvm_aobj and map that into
the process separately.

originally from openbsd, adapted for netbsd by me.
 1.72  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.71  16-May-2003  itojun branches: 1.71.2;
use strlcpy. [fixed off-by-one in subr_prop.c]
 1.70  14-Mar-2003  dsl cpu times were miscalculated because 'usecs' could go -ve...
There is still a problem that 'st = (u * st) / tot;' can overflow,
but that is harder to fix, and requires cpu times of ~5days.
(approved by christos)
 1.69  05-Mar-2003  dsl Apportion execution time evenly between stime and utime when the process
hasn't been interrupted by any profiling interrupts.
Collect time from all active LWPs.
 1.68  18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.67  03-Oct-2002  itojun branches: 1.67.2;
backout previous; (u_int) cast makes checks negative case too
 1.66  03-Oct-2002  itojun check negative arg. from openbsd
 1.65  03-Oct-2002  itojun check negative arg. from openbsd
 1.64  04-Sep-2002  matt Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.
 1.63  25-Aug-2002  thorpej Fix signed/unsigned comparison warning from GCC 3.3.
 1.62  23-Nov-2001  jdolecek branches: 1.62.8;
Two changes to setrlimit(2):
* return EINVAL if specified current limit exceeds specified hard limit.
This behaviour is required by SUSv2 (noted by Giles Lean on tech-kern)
* return EINVAL if an attempt is made to lower stack size limit below
current usage; this addresses bin/3045 by Jason Thorpe, and conforms to SUSv2
 1.61  12-Nov-2001  lukem add RCSIDs
 1.60  06-Feb-2001  eeh branches: 1.60.2; 1.60.4; 1.60.8;
Move maxdmap and maxsmap where they belong and make them big enough.
 1.59  20-Aug-2000  thorpej Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.
 1.58  27-Jun-2000  mrg remove include of <vm/vm.h>
 1.57  31-May-2000  thorpej branches: 1.57.2;
Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.
 1.56  26-May-2000  thorpej branches: 1.56.2;
First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.
 1.55  26-May-2000  thorpej Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.
 1.54  30-Mar-2000  augustss Get rid of register declarations.
 1.53  28-Sep-1999  bouyer branches: 1.53.2;
Remplace kern.shortcorename sysctl with a more flexible sheme,
core filename format, which allow to change the name of the core dump,
and to relocate it in a directory. Credits to Bill Sommerfeld for giving me
the idea :)
The default core filename format can be changed by options DEFCORENAME and/or
kern.defcorename
Create a new sysctl tree, proc, which holds per-process values (for now
the corename format, and resources limits). Process is designed by its pid
at the second level name. These values are inherited on fork, and the corename
fomat is reset to defcorename on suid/sgid exec.
Create a p_sugid() function, to take appropriate actions on suid/sgid
exec (for now set the P_SUGID flag and reset the per-proc corename).
Adjust dosetrlimit() to allow changing limits of one proc by another, with
credential controls.
 1.52  25-Jul-1999  thorpej Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.
 1.51  22-Jul-1999  thorpej Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.
 1.50  24-Mar-1999  mrg branches: 1.50.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.49  31-Aug-1998  thorpej Use the pool allocator and "nointr" pool page allocator for pcred and
plimit structures.
 1.48  13-Aug-1998  eeh Merge paddr_t changes into the main branch.
 1.47  04-Aug-1998  perry Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)
 1.46  31-Jul-1998  perry fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.
 1.45  01-Mar-1998  fvdl branches: 1.45.2;
Merge with Lite2 + local changes
 1.44  10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.43  05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.42  15-Oct-1997  mycroft Adjust u_int arguments of some system calls to int, to match user-level
prototypes.
 1.41  09-Oct-1997  enami Cosmetic changes;

- indent continuation line by four columns.
- delete whitespace after cast.
 1.40  09-Oct-1997  enami - round up requested soft stack limit by vm page size.
- don't round up size and truncate addr.
 1.39  22-Dec-1996  cgd branches: 1.39.10;
* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
 1.38  23-Oct-1996  matthias * In dosetrlimit ensure that rlim_cur and rlim_max are >0. Otherwise
the kernel might crash due to invalid values passed to setrlimit.
 1.37  02-Oct-1996  ws Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.
 1.36  11-Jul-1996  jtc Used signed instead of unsigned longs for sec and usec variables to
handle cases which would otherwise yeild wildly wrong results.
 1.35  13-Jun-1996  jtc Cast `sec' to a u_quad_t in `sec * 1000000 + usec' so the expression
is computed with quad integer arithmetic (so it won't overflow after
4294 seconds).
 1.34  09-Feb-1996  christos branches: 1.34.4;
More proto fixes
 1.33  04-Feb-1996  christos First pass at prototyping
 1.32  09-Dec-1995  mycroft Add a limfree(), and use it.
 1.31  07-Oct-1995  mycroft Prefix names of system call implementation functions with `sys_'.
 1.30  19-Sep-1995  thorpej Make system calls conform to a standard prototype and bring those
prototypes into scope.
 1.29  24-Jun-1995  christos Extracted all of the compat_xxx routines, and created a library [libcompat]
for them. There are a few #ifdef COMPAT_XX remaining, but they are not easy
or worth eliminating (yet).
 1.28  10-May-1995  christos tty_tb.c: need to include ioctl_compat.h in order to compile.
sysv_shm.c: make shm_find_segment_by_shmid global so it can be used by
COMPAT_HPUX. There should be a better way...
rest: Add #ifdef COMPAT_HPUX where needed
 1.27  21-Mar-1995  mycroft Update to use timer{add,sub}().
 1.26  05-Mar-1995  fvdl Two more "|| defined(COMPAT_LINUX)" that I somehow missed first time around.
 1.25  24-Dec-1994  cgd various cleanups for -Wall. some inspired by James Jegers.
 1.24  11-Dec-1994  mycroft Use __timer{add,sub}(), not timeval{add,sub}(). Remove the latter completely.
 1.23  17-Nov-1994  christos Added ifdef COMPAT_SVR4 to the kernel compat code needed.
 1.22  20-Oct-1994  cgd update for new syscall args description mechanism
 1.21  30-Aug-1994  mycroft Convert process, file, and namei lists and hash tables to use queue.h.
 1.20  29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.19  19-May-1994  cgd update to 4.4-Lite, with some local changes
 1.18  18-May-1994  cgd mostly-machine-indepedent switch, and changes to match. also, hack init_main
 1.17  17-May-1994  cgd copyright foo
 1.16  05-May-1994  mycroft Now setpri() is really toast.
 1.15  05-May-1994  cgd lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.
 1.14  04-May-1994  cgd expand the rlimit struct, kill last vestiges of off_t bogosity.
 1.13  25-Apr-1994  cgd minor cleanup
 1.12  18-Dec-1993  mycroft Canonicalize all #includes.
 1.11  10-Dec-1993  cgd dtrt with 'error' in setpriority()
 1.10  15-Sep-1993  cgd make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...
 1.9  04-Sep-1993  cgd branches: 1.9.2;
get rid of maxdmap, and seperate MAXDSIZ and MAXSSIZ in rlimit checking.
 1.8  23-Aug-1993  mycroft RLIMIT_OFILE --> RLIMIT_NOFILE
 1.7  13-Jul-1993  cgd break args structs out, into syscallname_args structs, so gcc2 doesn't
whine so much.
 1.6  27-Jun-1993  andrew ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.
 1.5  02-Jun-1993  cgd two fixes from ws:
if resource cur/max limits hosed, fix
copy the correct amount from the rusage struct
 1.4  01-Jun-1993  cgd break before letting child run, if tracing, and do the right
thing with stack limits
 1.3  20-May-1993  cgd add $Id$ strings, and clean up file headers where necessary
 1.2  04-Apr-1993  cgd now uses `maxfdescs' to bound `openfiles' resource limit.
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.9.2.4  10-Dec-1993  cgd update from trunk
 1.9.2.3  14-Nov-1993  mycroft Canonicalize all #includes.
 1.9.2.2  30-Sep-1993  deraadt calcru() calculates times from ticks.
 1.9.2.1  24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
init_main.c: New method of pseudo-device of initialization.
kern_clock.c: hardclock() and softclock() now take a pointer to a clockframe.
softclock() only does callouts.
kern_synch.c: Remove spurious declaration of endtsleep(). Adjust uses of
averunnable for new struct loadav.
subr_prf.c: Allow printf() formats in panic().
tty.c: averunnable changes.
vfs_subr.c: va_size and va_bytes are now quads.
 1.34.4.3  11-Dec-1996  mycroft From trunk:
Don't allow negative limits
 1.34.4.2  11-Jul-1996  jtc Pulled up from rev 1.36
 1.34.4.1  13-Jun-1996  jtc Pulled up from revision 1.35.
Cast `sec' to a u_quad_t in `sec * 1000000 + usec' so the expression
is computed with quad integer arithmetic (so it won't overflow after
4294 seconds).
 1.39.10.1  14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.45.2.2  08-Aug-1998  eeh Revert cdevsw mmap routines to return int.
 1.45.2.1  30-Jul-1998  eeh Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
 1.50.4.1  02-Aug-1999  thorpej Update from trunk.
 1.53.2.2  11-Feb-2001  bouyer Sync with HEAD.
 1.53.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.56.2.1  22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.57.2.2  03-Oct-2002  itojun backout previous.
 1.57.2.1  03-Oct-2002  itojun sys/kern/kern_resource.c 1.65-1.66

Check negative args to set/getrlimit.
 1.60.8.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.60.4.3  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.60.4.2  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.60.4.1  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.60.2.7  17-Sep-2002  nathanw Catch up to -current.
 1.60.2.6  27-Aug-2002  nathanw Catch up to -current.
 1.60.2.5  12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.60.2.4  29-May-2002  nathanw #include <sys/sa.h> before <sys/syscallargs.h>, to provide sa_upcall_t
now that <sys/param.h> doesn't include <sys/sa.h>.

(Behold the Power of Ed)
 1.60.2.3  08-Jan-2002  nathanw Catch up to -current.
 1.60.2.2  14-Nov-2001  nathanw Catch up to -current.
 1.60.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.62.8.1  29-Aug-2002  gehenna catch up with -current.
 1.67.2.1  18-Dec-2002  gmcgarry Merge pcred and ucred, and poolify. TBD: check backward compatibility
and factor-out some higher-level functionality.
 1.71.2.7  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.71.2.6  01-Apr-2005  skrll Sync with HEAD.
 1.71.2.5  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.71.2.4  19-Oct-2004  skrll Sync with HEAD
 1.71.2.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.71.2.2  18-Sep-2004  skrll Sync with HEAD.
 1.71.2.1  03-Aug-2004  skrll Sync with HEAD
 1.76.2.1  21-Apr-2004  jmc Pullup rev 1.78 (requested by atatat in ticket #93)

Lots of sysctl descriptions mostly copied from sysctl(3).
 1.86.6.2  26-Mar-2005  yamt sync with head.
 1.86.6.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.86.4.1  29-Apr-2005  kent sync with -current
 1.87.2.1  18-Sep-2005  tron Pull up following revision(s) (requested by fvdl in ticket #798):
sys/compat/sunos/sunos_exec.c: revision 1.47
sys/compat/pecoff/pecoff_emul.c: revision 1.11
sys/arch/sparc64/sparc64/netbsd32_machdep.c: revision 1.45
sys/arch/amd64/amd64/netbsd32_machdep.c: revision 1.12
sys/sys/proc.h: revision 1.198
sys/compat/mach/mach_exec.c: revision 1.56
sys/compat/freebsd/freebsd_exec.c: revision 1.27
sys/arch/sparc64/include/vmparam.h: revision 1.27
sys/kern/kern_resource.c: revision 1.91
sys/compat/netbsd32/netbsd32_netbsd.c: revision 1.88
sys/compat/osf1/osf1_exec.c: revision 1.39
sys/compat/svr4_32/svr4_32_resource.c: revision 1.5
sys/compat/ultrix/ultrix_misc.c: revision 1.99
sys/compat/svr4_32/svr4_32_exec.h: revision 1.9
sys/kern/exec_elf32.c: revision 1.103
sys/compat/aoutm68k/aoutm68k_exec.c: revision 1.19
sys/compat/sunos32/sunos32_exec.c: revision 1.20
sys/compat/hpux/hpux_exec.c: revision 1.46
sys/compat/darwin/darwin_exec.c: revision 1.40
sys/kern/sysv_shm.c: revision 1.83
sys/uvm/uvm_extern.h: revision 1.99
sys/uvm/uvm_mmap.c: revision 1.89
sys/kern/kern_exec.c: revision 1.195
sys/compat/netbsd32/netbsd32.h: revision 1.31
sys/arch/sparc64/sparc64/svr4_32_machdep.c: revision 1.20
sys/compat/svr4/svr4_exec.c: revision 1.56
sys/compat/irix/irix_exec.c: revision 1.41
sys/compat/ibcs2/ibcs2_exec.c: revision 1.63
sys/compat/svr4_32/svr4_32_exec.c: revision 1.16
sys/arch/amd64/include/vmparam.h: revision 1.8
sys/compat/linux/common/linux_exec.c: revision 1.73
Fix some things regarding COMPAT_NETBSD32 and limits/VM addresses.
* For sparc64 and amd64, define *SIZ32 VM constants.
* Add a new function pointer to struct emul, pointing at a function
that will return the default VM map address. The default function
is uvm_map_defaultaddr, which just uses the VM_DEFAULT_ADDRESS
macro. This gives emulations control over the default map address,
and allows things to be mapped at the right address (in 32bit range)
for COMPAT_NETBSD32.
* Add code to adjust the data and stack limits when a COMPAT_NETBSD32
or COMPAT_SVR4_32 binary is executed.
* Don't use USRSTACK in kern_resource.c, use p_vmspace->vm_minsaddr
instead (emulations might have set it differently)
* Since this changes struct emul, bump kernel version to 3.99.2
Tested on amd64, compile-tested on sparc64.
 1.98.2.12  24-Mar-2008  yamt sync with head.
 1.98.2.11  17-Mar-2008  yamt sync with head.
 1.98.2.10  27-Feb-2008  yamt sync with head.
 1.98.2.9  04-Feb-2008  yamt sync with head.
 1.98.2.8  21-Jan-2008  yamt sync with head
 1.98.2.7  07-Dec-2007  yamt sync with head
 1.98.2.6  15-Nov-2007  yamt sync with head.
 1.98.2.5  27-Oct-2007  yamt sync with head.
 1.98.2.4  03-Sep-2007  yamt sync with head.
 1.98.2.3  26-Feb-2007  yamt sync with head.
 1.98.2.2  30-Dec-2006  yamt sync with head.
 1.98.2.1  21-Jun-2006  yamt sync with head.
 1.99.6.2  01-Jun-2006  kardel Sync with head.
 1.99.6.1  22-Apr-2006  simonb Sync with head.
 1.99.4.1  09-Sep-2006  rpaulo sync with head
 1.99.2.1  18-Feb-2006  yamt sync with head.
 1.100.6.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.100.4.4  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.100.4.3  12-Mar-2006  elad Use kauth_cred_ismember_gid() instead of rolling our own.
 1.100.4.2  10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.100.4.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.100.2.3  14-Sep-2006  yamt sync with head.
 1.100.2.2  11-Aug-2006  yamt sync with head
 1.100.2.1  24-May-2006  yamt sync with head.
 1.103.4.12  05-Feb-2007  ad IPL_STATCLOCK needs to be >= IPL_CLOCK, so assume that proc::p_stmutex is
always a spinlock.
 1.103.4.11  05-Feb-2007  ad - When clearing signals dequeue siginfo first and free later, once
outside the lock permiter.
- Push kernel_lock back in a a couple of places.
- Adjust limcopy() to be MP safe (this needs redoing).
- Fix a couple of bugs noticed along the way.
- Catch up with condvar changes.
 1.103.4.10  01-Feb-2007  ad Sync with head.
 1.103.4.9  30-Jan-2007  ad Remove support for SA. Ok core@.
 1.103.4.8  27-Jan-2007  ad Drop proclist_mutex and proc::p_smutex back to IPL_VM.
 1.103.4.7  12-Jan-2007  ad Sync with head.
 1.103.4.6  29-Dec-2006  ad Checkpoint work in progress.
 1.103.4.5  18-Nov-2006  ad Sync with head.
 1.103.4.4  17-Nov-2006  ad Checkpoint work in progress.
 1.103.4.3  24-Oct-2006  ad - Redo LWP locking slightly and fix some races.
- Fix some locking botches.
- Make signal mask / stack per-proc for SA processes.
- Add _lwp_kill().
 1.103.4.2  20-Oct-2006  ad - Update for need_proftick() change.
- Make run time per-LWP and have calcru() compute the whole-process value.
- Minor locking changes.
 1.103.4.1  11-Sep-2006  ad - Convert some lockmgr() locks to mutexes and RW locks.
- Acquire proclist_lock and p_crmutex in some obvious places.
 1.105.2.3  18-Dec-2006  yamt sync with head.
 1.105.2.2  10-Dec-2006  yamt sync with head.
 1.105.2.1  22-Oct-2006  yamt sync with head
 1.108.2.2  21-Jan-2007  bouyer Pull up following revision(s) (requested by elad in ticket #379):
sys/secmodel/bsd44/secmodel_bsd44_suser.c: revision 1.33 via patch
share/examples/secmodel/secmodel_example.c: revision 1.14 via patch
sys/sys/kauth.h: revision 1.35 via patch
sys/kern/kern_resource.c: revision 1.112 via patch
share/man/man9/kauth.9: revision 1.48 via patch
Kill KAUTH_PROCESS_RESOURCE and just replace it with two actions for
nice and rlimit.
 1.108.2.1  09-Dec-2006  bouyer Pull up following revision(s) (requested by elad in ticket #257):
sys/kern/kern_resource.c: revision 1.109
PR/35021: Brian de Alwis: root cannot get/set rlimit information of user
processes through sysctl
Fix inverted logic in boolean assignment. This is why these tests should
not be done outside the secmodel code.
Thanks for the report.
 1.113.2.3  12-Mar-2007  rmind Sync with HEAD.
 1.113.2.2  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.113.2.1  20-Feb-2007  rmind General Common Scheduler Framework (CSF) patch import. Huge thanks for
Daniel Sieger <dsieger at TechFak.Uni-Bielefeld de> for this work.

Short abstract: Split the dispatcher from the scheduler in order to
make the scheduler more modular. Introduce initial API for other
schedulers' implementations.

Discussed in tech-kern@
OK: yamt@, ad@

Note: further work will go soon.
 1.116.4.1  11-Jul-2007  mjf Sync with head.
 1.116.2.8  05-Nov-2007  ad - Locking tweaks for estcpu/nice. XXX The schedclock musn't run above
IPL_SCHED.
- Hide most references to l_estcpu.
- l_policy was here first, but l_class is referenced in more places now.
 1.116.2.7  09-Oct-2007  ad Sync with head.
 1.116.2.6  28-Aug-2007  yamt uid_find: IPL_SOFTNET -> IPL_VM for ui_lock for now and add an XXX comment,
to avoid locking order problems with kernel_lock.
 1.116.2.5  20-Aug-2007  ad Sync with HEAD.
 1.116.2.4  14-Jul-2007  ad Make it possible to track time spent by soft interrupts as is done for
normal LWPs, and provide a sysctl to switch it on/off. Not enabled by
default because microtime() is not free. XXX Not happy with this but
I want it get it out of my local tree for the time being.
 1.116.2.3  08-Jun-2007  ad Sync with head.
 1.116.2.2  13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.116.2.1  21-Mar-2007  ad - Put a lock around the proc's CWD info (work in progress).
- Replace some more simplelocks.
- Make lbolt a condvar.
 1.118.6.6  09-Dec-2007  jmcneill Sync with HEAD.
 1.118.6.5  03-Dec-2007  joerg Sync with HEAD.
 1.118.6.4  06-Nov-2007  joerg Sync with HEAD.
 1.118.6.3  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.118.6.2  02-Oct-2007  joerg Sync with HEAD.
 1.118.6.1  09-Aug-2007  jmcneill Sync with HEAD.
 1.118.2.2  10-Sep-2007  skrll Sync with HEAD.
 1.118.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.119.2.3  23-Mar-2008  matt sync with HEAD
 1.119.2.2  09-Jan-2008  matt sync with HEAD
 1.119.2.1  06-Nov-2007  matt sync with HEAD
 1.121.2.2  14-Oct-2007  yamt sync with head.
 1.121.2.1  06-Oct-2007  yamt sync with head.
 1.123.4.4  18-Feb-2008  mjf Sync with HEAD.
 1.123.4.3  27-Dec-2007  mjf Sync with HEAD.
 1.123.4.2  08-Dec-2007  mjf Sync with HEAD.
 1.123.4.1  19-Nov-2007  mjf Sync with HEAD.
 1.123.2.1  13-Nov-2007  bouyer Sync with HEAD
 1.126.2.3  26-Dec-2007  ad Sync with head.
 1.126.2.2  15-Dec-2007  ad - Use pool_cache for a few more items and make those caches static.
- Mark another 10 syscalls MPSAFE including execve(). A small bit of
work is required to fix a couple of issues (tty, kqueue).
 1.126.2.1  08-Dec-2007  ad Sync with head.
 1.127.4.2  23-Jan-2008  bouyer Sync with HEAD.
 1.127.4.1  02-Jan-2008  bouyer Sync with HEAD
 1.132.6.5  17-Jan-2009  mjf Sync with HEAD.
 1.132.6.4  05-Oct-2008  mjf Sync with HEAD.
 1.132.6.3  29-Jun-2008  mjf Sync with HEAD.
 1.132.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.132.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.132.2.1  24-Mar-2008  keiichi sync with head.
 1.137.4.2  04-Jun-2008  yamt sync with head
 1.137.4.1  18-May-2008  yamt sync with head.
 1.137.2.3  01-Nov-2008  christos Sync with head.
 1.137.2.2  03-Apr-2008  christos rlim_t cannot be negative.
 1.137.2.1  29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.139.2.5  11-Aug-2010  yamt sync with head.
 1.139.2.4  11-Mar-2010  yamt sync with head
 1.139.2.3  20-Jun-2009  yamt sync with head
 1.139.2.2  04-May-2009  yamt sync with head.
 1.139.2.1  16-May-2008  yamt sync with head.
 1.141.2.5  10-Oct-2008  skrll Sync with HEAD.
 1.141.2.4  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.141.2.3  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.141.2.2  14-May-2008  wrstuden Per discussion with ad, remove most of the #include <sys/sa.h> lines
as they were including sa.h just for the type(s) needed for syscallargs.h.

Instead, create a new file, sys/satypes.h, which contains just the
types needed for syscallargs.h. Yes, there's only one now, but that
may change and it's probably more likely to change if it'd be difficult
to handle. :-)

Per discussion with matt at n dot o, add an include of satypes.h to
sigtypes.h. Upcall handlers are kinda signal handlers, and signalling
is the header file that's already included for syscallargs.h that
closest matches SA.

This shaves about 3000 lines off of the diff of the branch relative
to the base. That also represents about 18% of the total before this
checkin.

I think this reduction is very good thing.
 1.141.2.1  10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.142.2.1  27-Jun-2008  simonb Sync with head.
 1.143.2.1  19-Oct-2008  haad Sync with HEAD.
 1.147.4.2  14-Aug-2009  snj Pull up following revision(s) (requested by dsl in ticket #893):
sys/kern/kern_resource.c: revision 1.152
PR/41489: Stathis Kamperis: setpriority(2) returns EACCES instead of EPERM
Per discussion on the PR's audit trail, put back original checks for now.
 1.147.4.1  01-Apr-2009  snj branches: 1.147.4.1.2; 1.147.4.1.4;
Pull up following revision(s) (requested by mrg in ticket #622):
bin/csh/csh.1: revision 1.46
bin/csh/func.c: revision 1.37
bin/ps/print.c: revision 1.111
bin/ps/ps.c: revision 1.74
bin/sh/miscbltin.c: revision 1.38
bin/sh/sh.1: revision 1.92 via patch
external/bsd/top/dist/machine/m_netbsd.c: revision 1.7
lib/libkvm/kvm_proc.c: revision 1.82
sys/arch/mips/mips/cpu_exec.c: revision 1.55
sys/compat/darwin/darwin_exec.c: revision 1.57
sys/compat/ibcs2/ibcs2_exec.c: revision 1.73
sys/compat/irix/irix_resource.c: revision 1.15
sys/compat/linux/arch/amd64/linux_exec_machdep.c: revision 1.16
sys/compat/linux/arch/i386/linux_exec_machdep.c: revision 1.12
sys/compat/linux/common/linux_limit.h: revision 1.5
sys/compat/osf1/osf1_resource.c: revision 1.14
sys/compat/svr4/svr4_resource.c: revision 1.18
sys/compat/svr4_32/svr4_32_resource.c: revision 1.17
sys/kern/exec_subr.c: revision 1.62
sys/kern/init_sysctl.c: revision 1.160
sys/kern/kern_exec.c: revision 1.288
sys/kern/kern_resource.c: revision 1.151
sys/sys/param.h: patch
sys/sys/resource.h: revision 1.31
sys/sys/sysctl.h: revision 1.184
sys/uvm/uvm_extern.h: revision 1.153
sys/uvm/uvm_glue.c: revision 1.136
sys/uvm/uvm_mmap.c: revision 1.128
usr.bin/systat/ps.c: revision 1.32
- - add new RLIMIT_AS (aka RLIMIT_VMEM) resource that limits the total
address space available to processes. this limit exists in most other
modern unix variants, and like most of them, our defaults are unlimited.
remove the old mmap / rlimit.datasize hack.
- - adds the VMCMD_STACK flag to all the stack-creation vmcmd callers.
it is currently unused, but was added a few years ago.
- - add a pair of new process size values to kinfo_proc2{}. one is the
total size of the process memory map, and the other is the total size
adjusted for unused stack space (since most processes have a lot of
this...)
- - patch sh, and csh to notice RLIMIT_AS. (in some cases, the alias
RLIMIT_VMEM was already present and used if availble.)
- - patch ps, top and systat to notice the new k_vm_vsize member of
kinfo_proc2{}.
- - update irix, svr4, svr4_32, linux and osf1 emulations to support
this information. (freebsd could be done, but that it's best left
as part of the full-update of compat/freebsd.)
this addresses PR 7897. it also gives correct memory usage values,
which have never been entirely correct (since mmap), and have been
very incorrect since jemalloc() was enabled.
tested on i386 and sparc64, build tested on several other platforms.
thanks to many folks for feedback and testing but most espcially
chuq and yamt for critical suggestions that lead to this patch not
having a special ugliness i wasn't happy with anyway :-)
 1.147.4.1.4.1  21-Apr-2010  matt sync to netbsd-5
 1.147.4.1.2.1  14-Aug-2009  snj Pull up following revision(s) (requested by dsl in ticket #893):
sys/kern/kern_resource.c: revision 1.152
PR/41489: Stathis Kamperis: setpriority(2) returns EACCES instead of EPERM
Per discussion on the PR's audit trail, put back original checks for now.
 1.147.2.2  28-Apr-2009  skrll Sync with HEAD.
 1.147.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.149.2.2  23-Jul-2009  jym Sync with HEAD.
 1.149.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.154.2.2  17-Aug-2010  uebayasi Sync with HEAD.
 1.154.2.1  30-Apr-2010  uebayasi Sync with HEAD.
 1.155.2.4  12-Jun-2011  rmind sync with head
 1.155.2.3  31-May-2011  rmind sync with head
 1.155.2.2  03-Jul-2010  rmind sync with head
 1.155.2.1  30-May-2010  rmind sync with head
 1.157.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.166.2.1  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.167.2.5  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.167.2.4  23-Jan-2013  yamt sync with head
 1.167.2.3  16-Jan-2013  yamt sync with (a bit old) head
 1.167.2.2  30-Oct-2012  yamt sync with head
 1.167.2.1  17-Apr-2012  yamt sync with head
 1.169.2.4  03-Dec-2017  jdolecek update from HEAD
 1.169.2.3  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.169.2.2  25-Feb-2013  tls resync with head
 1.169.2.1  20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.172.2.1  18-May-2014  rmind sync with head
 1.174.2.2  28-Aug-2017  skrll Sync with HEAD
 1.174.2.1  05-Oct-2016  skrll Sync with HEAD
 1.175.4.1  21-Apr-2017  bouyer Sync with HEAD
 1.175.2.1  26-Apr-2017  pgoyette Sync with HEAD
 1.176.12.2  21-May-2018  pgoyette Sync with HEAD
 1.176.12.1  16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.181.2.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.181.2.1  10-Jun-2019  christos Sync with HEAD
 1.182.4.1  12-Dec-2019  martin Pull up following revision(s) (requested by ad in ticket #546):

sys/kern/kern_resource.c: revision 1.183
sys/kern/kern_softint.c: revision 1.49

calcru: ignore running softints, unless softint_timing is on.
Fixes crazy times reported for proc0.
 1.183.2.2  29-Feb-2020  ad Sync with head.
 1.183.2.1  17-Jan-2020  ad Sync with head.

RSS XML Feed