Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/kern_time.c
RevisionDateAuthorComments
 1.228  19-Mar-2025  pho clock_getres(2): Support CLOCK_{PROCESS,THREAD}_CPUTIME_ID

The syscall previously returned EINVAL for these two clocks. It still has
no support for CLOCK_VIRTUAL and CLOCK_PROF but clock_gettime(2) doesn't
either.

Fixes PR kern/59127
 1.227  22-Dec-2024  riastradh kern: Move some purely arithmetic routines to subr_time_arith.c.

Preparation for testing and fixing:

PR kern/58922: itimer(9): arithmetic overflow
PR kern/58925: itimer(9) responds erratically to clock wound back
PR kern/58926: itimer(9) integer overflow in overrun counting
PR kern/58927: itimer(9): overrun accounting is broken
 1.226  22-Dec-2024  riastradh limits.h: Define DELAYTIMER_MAX.

This is the maximum value of timer_getoverrun(), and was introduced
in IEEE Std 1003.1b-1993.

Prompted by:

PR kern/58926: itimer(9) integer overflow in overrun counting
 1.225  22-Dec-2024  riastradh kern_time.c: Sort includes. Add missing includes.

No functional change intended.

Preparation for factoring out arithmetic to add tests for and fix:

PR kern/58922: itimer(9): arithmetic overflow
PR kern/58925: itimer(9) responds erratically to clock wound back
PR kern/58926: itimer(9) integer overflow in overrun counting
PR kern/58927: itimer(9): overrun accounting is broken
 1.224  22-Dec-2024  riastradh itimer_settime(9): Assert input is sane.

Caller is responsible for validating/sanitizing.

Prompted by:

PR kern/58914: timerfd_settime(2) is missing itimespecfix
 1.223  19-Dec-2024  riastradh timer_settime(2): Return relative duration remaining.

Not absolute time of next event.

PR kern/58917: timer_settime and timerfd_settime return absolute time
of next event
 1.222  19-Dec-2024  riastradh timer_settime(2): Fix error code for negative it_interval.

PR kern/58920: timer_settime fails ETIMEDOUT on negative interval,
not EINVAL
 1.221  23-Feb-2023  riastradh itimer(9): Sprinkle some more assertions.
 1.220  23-Feb-2023  riastradh itimer(9): Use callout_setfunc/schedule instead of callout_reset.

No semantic change intended.
 1.219  18-Feb-2023  thorpej In itimer_arm_real(), KASSERT that it->it_dying is false. This was
already implicitly assumed, but make it explicit in hopes of tracking
down kern/57226.
 1.218  26-Oct-2022  riastradh branches: 1.218.2;
sys: Put externs for time_adjtime and time_adjusted in .h files.

time_adjtime: sys/timex.h (defined in ntp code)
time_adjusted: sys/timevar.h (defined in non-ntp code)

(Not really sure this is a valuable distinction to maintain; there's
non-ntp code that uses time_adjtime too.)
 1.217  01-Jul-2022  riastradh kern: KNF in kern_time.c: Omit needless return parentheses.

Also nix trailing whitespace while here.

No functional change intended.
 1.216  27-Jun-2022  riastradh setitimer(2): Avoid arithmetic overflow in periodic bookkeeping.

Reported-by: syzbot+93cef6090844ec304cde@syzkaller.appspotmail.com
 1.215  26-Jun-2022  riastradh setitimer(2): Guard against overflow in arithmetic.

Reported-by: syzbot+6036bc8b6d2b963e3ba6@syzkaller.appspotmail.com
 1.214  15-May-2022  riastradh adjtime(2): Handle negative tv_sec and tv_usec.

Previously I clamped these to avoid dangerous arithmetic overflow.
But I assumed sensible values should be nonnegative.

For tv_sec, this assumption was just wrong -- the adjustment may be
negative.

For tv_usec, this assumption is...not wrong, but also not right.
tv_usec is not _supposed_ to be negative (by POSIX, the type need
only represent values in [-1, 1000000]; semantically the member is
supposed to be a nonnegative number of microseconds below 1000000),
but ntp abuses it to hold negative values, for reasons unclear -- the
same effect could be had by subtracting one from tv_sec, and adding
1000000 to the negative tv_usec. However, let's not break existing
ntp userlands...
 1.213  13-Mar-2022  riastradh kern: Handle clock winding back in nanosleep1 without overflow.

Reported-by: syzbot+3bdd260582424a611946@syzkaller.appspotmail.com
 1.212  12-Mar-2022  riastradh kern: Clamp time_adjtime to avoid overflow.

Reported-by: syzbot+7edce1a31dfd2a5eaa18@syzkaller.appspotmail.com
 1.211  03-Apr-2021  simonb Centralise the setitimer() timer type validation in dosetitimer() as is
done with dogetitimer().
 1.210  08-Dec-2020  thorpej branches: 1.210.2;
A couple of tweaks to the previous re-factor:

- Some of what was defined as "generic itimer" behavior turned out to be
ptimer-specific. As such, everything related to the "fired timer queue"
is now specific to ptimers, and the queue and softint handle fields of
itimer_ops are not needed.

- Split itimer_fini() into 2 parts: itimer_poision() marks the timer as
dead and attempts to cancel it. itimer_fini() is then just responsible
for freeing itimer resources and releasing the lock. They are split
into two parts, as ptimers require an addition processing step between
those two operations, but other kinds of itimers do not necessarily require
that.

- Export a few more itimer-related symbols that other itimer types will
need.

Riding previous kernel version bump since there are no external uses of
this code since the version bump that accompanied the original change.
 1.209  07-Dec-2020  christos fix the build; gcc does not always see that it can't happen.
 1.208  06-Dec-2020  thorpej Fix an uninitialized pointer deref introduced in rev 1.207.

Reported-by: syzbot+6d69101d5f2fd954c4e2@syzkaller.appspotmail.com
 1.207  05-Dec-2020  thorpej Refactor interval timers to make it possible to support types other than
the BSD/POSIX per-process timers:

- "struct ptimer" is split into "struct itimer" (common interval timer
data) and "struct ptimer" (per-process timer data, which contains a
"struct itimer").

- Introduce a new "struct itimer_ops" that supplies information about
the specific kind of interval timer, including it's processing
queue, the softint handle used to schedule processing, the function
to call when the timer fires (which adds it to the queue), and an
optional function to call when the CLOCK_REALTIME clock is changed by
a call to clock_settime() or settimeofday().

- Rename some fuctions to clearly identify what they're operating on
(ptimer vs itimer).

- Use kmem(9) to allocate ptimer-related structures, rather than having
dedicated pools for them.

Welcome to NetBSD 9.99.77.
 1.206  27-Oct-2020  nia branches: 1.206.2;
kern_time: prevent the system clock from being set too low or high

currently doing this will drive KUBSAN haywire and possibly cause
system lock-ups, so more testing should probably be performed before
we let the clock be set too many thousands of years into the future.

ditto for negative values, which were being passed by chrony for
some reason while my internet connection was being unreliable.
this also triggered some interesting KUBSAN reports.
 1.205  23-May-2020  ad Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.204  14-May-2020  maxv Fix uninitialized memory access. Found by KMSAN.

Reported-by: syzbot+9f2a173d29d66c88f9ac@syzkaller.appspotmail.com
 1.203  02-Jan-2020  thorpej - Eliminate the global "boottime" variable, which was being accessed
without any synchronization against changes by e.g. clock_settime().
- Replace with new getbinboottime() / getnanoboottime() / getmicroboottime()
functions (naming mirrors that of other time access functions in kern_tc.c).
It returns the (maybe-converted) value of timebasebin, which also tracks
our estimate of when the system was booted (i.e. the legacy "boottime" was
redundant).

XXX There needs to be a lockless synchronization mechanism for reading
timebasebin, but this is a problem in kern_tc.c that pre-existed these
"boottime" changes. At least now the problem is centralized in one location.
 1.202  01-Jan-2020  thorpej Remove superfluous splclock()/splx() pair around tc_setclock().
 1.201  05-Oct-2019  kamil Check for valid timespec in clock_settime1()

An alternative approach would be to check the valie in settime1(), but
it would result in multiple checks for valid tv_nsec, as there are
settime1() users that need to check the ranges earlier.

Reported-by: syzbot+96e5ce2c2c704d96c2f0@syzkaller.appspotmail.com
 1.200  20-Sep-2019  kamil Validate usec ranges in settimeofday1()
 1.199  07-Aug-2019  mrg mark a variable __diagused to fix this problem affecting many builds:

kern/kern_time.c:1413:6: error: variable 'error' set but not used [-Werror=unused-but-set-variable]
 1.198  06-Aug-2019  riastradh Fix race in timer destruction.

Anything we confirmed about the world before callout_halt may cease
to be true afterward, so make sure to start over in that case.

Add some comments explaining what's going on.

Reported-by: syzbot+d58da99969f58c1a024a@syzkaller.appspotmail.com
 1.197  10-Mar-2019  kre branches: 1.197.4;
Fix the code that deals with very long sleeps (> 248 days) which
go beyond the maximum that the callout mechanism can handle.
[See the comments in tvtohz() in subr_sleep.c for the details.]

When that happens the timeout is clamped to MAX_INT (ticks), and the
code in nanosleep1() looped (or tried to) repeating the sleep (aka
kpause()) until the requested end time for the sleep was reached.

Unfortunately, the code assumed that kpause() would return 0 when
it returned after the timeout expired. But it doesn't, it returns
EWOULDBLOCK instead (why is incomprehensible to me, but I assume
there is a reason.) [That comes from sleepq_block() which returns
EWOULDBLOCK when callout_halt() indicates that the callout had fired,
which is exactly what has happened when the time has elapsed.]

There was already code to deal with that EWOULDBLOCK and return 0
instead of an error in that case - but it was placed after the
error code was tested against 0 for the purposes of the loop.

Simply move the EWOULDBLOCK->0 mapping earlier, so the code which
is expecting "error == 0" to mean "nothing went wrong" actually
gets to see that happen, and the loop can actually loop.

(Someday the loop should probably be rewritten as a loop, instead of
as a bunch of code followed by a "goto again"!)
 1.196  24-Feb-2019  mlelstv The callout is used by any nonvirtual timer including CLOCK_MONOTONIC
and needs to be initialized.

Detected by [syzkaller].
 1.195  10-Feb-2019  christos Introduce PR_ZERO to avoid open-coding memset()s everywhere. OK riastradh@.
 1.194  31-Jan-2019  maxv Fix kernel info leaks.
 1.193  29-Nov-2018  maxv Improve my kern_time.c::rev1.192, systematically clear the buffers we get
from 'ptimer_pool' to prevent more leaks.
 1.192  28-Nov-2018  maxv Fix kernel info leak.

+ Possible info leak: [len=32, leaked=16]
| #0 0xffffffff80baf3a7 in kleak_copyout
| #1 0xffffffff80b940f8 in sys___timer_settime50
| #2 0xffffffff80259c42 in syscall
 1.191  13-Nov-2018  maxv Fix kernel info leak. There are 2x4 bytes of padding in struct itimerval.

[ 738.451860] kleak: Possible leak in copyout: [len=32, leaked=8]
[ 738.481840] #0 0xffffffff80b7c42a in kleak_note <netbsd>
[ 738.491821] #1 0xffffffff80b7c4aa in kleak_copyout <netbsd>
[ 738.501806] #2 0xffffffff80b6154e in sys___getitimer50 <netbsd>
[ 738.511778] #3 0xffffffff80b61e39 in sys___setitimer50 <netbsd>
[ 738.521781] #4 0xffffffff8025ab3c in sy_call <netbsd>
[ 738.521781] #5 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 738.531808] #6 0xffffffff8025adf4 in syscall <netbsd>
 1.190  11-Nov-2018  maxv Fix stack info leak. There are 4 bytes of padding in struct timeval. Looks
like there are other leaks related to timeval in this file.

[ 133.414352] kleak: Possible leak in copyout: [len=16, leaked=4]
[ 133.414352] #0 0xffffffff80224d0a in kleak_note <netbsd>
[ 133.424360] #1 0xffffffff80224d8a in kleak_copyout <netbsd>
[ 133.434361] #2 0xffffffff80b5fd79 in sys___gettimeofday50 <netbsd>
[ 133.434361] #3 0xffffffff8025a89c in sy_call <netbsd>
[ 133.444351] #4 0xffffffff8025aace in sy_invoke <netbsd>
[ 133.454365] #5 0xffffffff8025ab54 in syscall <netbsd>
 1.189  11-Nov-2016  njoly branches: 1.189.8; 1.189.14; 1.189.16;
Adjust clock_nanosleep(2) to not copyout remaining time struct if
TIMER_ABSTIME flag is set.

Ok Christos.
 1.188  07-Jul-2016  msaitoh branches: 1.188.2;
KNF. Remove extra spaces. No functional change.
 1.187  10-Jun-2016  christos GSoC 2016: Charles Cui: Add timer related macros
_POSIX_CPUTIME
_POSIX_THREAD_CPUTIME
_POSIX_DELAYTIMER_MAX
 1.186  23-Apr-2016  christos Add clock_getcpuclockid2(2) as well as CLOCK_{PROCESS,THREAD}_CPUTIME_ID.
 1.185  08-Mar-2016  christos - GC pts_fired, and fix the comment about MAX_TIMERS
- Bump MAX_TIMERS to 36 so that we have 32 POSIX user timers which is the
minimum required.
 1.184  03-Mar-2016  uwe Don't leak garabage from the kernel stack on sleep(0) and equivalents.
Hat tip to perl's ext/POSIX/t/wrappers.t
 1.183  26-Feb-2016  christos Make comments and code match reality; there are 4 reserved timers.
 1.182  06-Oct-2015  christos CID/1325753: remove extra return.
 1.181  02-Oct-2015  christos PR/50295: clock_nanotime() should not set errno, but return the error.
 1.180  24-Jul-2015  maxv Unused inits (harmless).

Found by Brainy.
 1.179  22-May-2013  christos branches: 1.179.8; 1.179.10; 1.179.12;
Make ts2timo(9) always return the absolute start time if the start argument
is present, and handle the TIMER_ABSTIME case in nanosleep1(9).
 1.178  31-Mar-2013  christos always return immediately on error, and if we passed negative seconds,
return with 0.
 1.177  29-Mar-2013  martin Move clock_gettime1() to subr_time.c (which is included in rump kernels)
 1.176  29-Mar-2013  christos Centralize the computation of struct timespec to the int timo.
Make lwp_park take the regular arguments for specifying what kind
of timeout we supply like clock_nanosleep(), namely clockid_t and flags.
 1.175  02-Oct-2012  christos kernel portion of clock_nanosleep()
 1.174  22-Mar-2012  dholland branches: 1.174.2;
Misplaced parenthesis; fixes PR 44927
 1.173  20-Feb-2012  rmind itimerfire: fix a regression, check if timer is already queued.
 1.172  19-Feb-2012  rmind Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.
 1.171  18-Dec-2011  christos Fix monotonic interval timers.
 1.170  27-Oct-2011  christos branches: 1.170.2; 1.170.6;
There is no reason not to support CLOCK_MONOTONIC in {g,s}etitimer() since
the underlying implementation already supports it, so add it.
 1.169  27-Jul-2011  uebayasi These don't need uvm/uvm_extern.h.
 1.168  08-Apr-2011  yamt implement timer_create of CLOCK_MONOTONIC
 1.167  05-Apr-2011  yamt fix assertion failure in timer_intr. CLOCK_REALTIME timers can be on
timer_queue.
 1.166  17-Dec-2010  yamt branches: 1.166.2;
realtimerexpire: rename a confusing variable. no functional change.
(now_ms -> now_ns as it hold a nano second value)
 1.165  08-Apr-2010  njoly Add a new clock_gettime1() function that holds most of the
clock_gettime syscall code (except for the copyout). Adjust all
corresponding syscalls to make use of it.
 1.164  03-Apr-2010  njoly Move most clock_getres syscall code, except for coypout call, to a new
clock_getres1() function which can be used by emulations. Adjust all
clock_getres syscalls to now make of use it.
 1.163  10-Dec-2009  drochner branches: 1.163.2; 1.163.4;
If a struct sigevent with SIGEV_SIGNAL is passed to timer_create(2),
check the signal number to be in the allowed range. An invalid
signal number could crash the kernel by overflowing the sigset_t
array.
More checks would be good, and SIGEV_THREAD shouldn't be dropped
silently, but this fixes at least the local DOS vulnerability.
 1.162  03-Oct-2009  elad Introduce time_wraps() to check if setting the time will wrap it (or
close to it). Useful for secmodels.

Replace open-coded form with it in secmodel code (securelevel, keylock).

Note: I need to find a way to make secmodel_keylock.c ~<100 lines.
 1.161  13-Sep-2009  pooka Wipe out the last vestiges of POOL_INIT with one swift stroke. In
most cases, use a proper constructor. For proplib, give a local
equivalent of POOL_INIT for the kernel object implementation. This
way the code structure can be preserved, and a local link set is
not hazardous anyway (unless proplib is split to several modules,
but that'll be the day).

tested by booting a kernel in qemu and compile-testing i386/ALL
 1.160  29-Mar-2009  christos Move the internal poll/select related API's to use timespec instead
of timeval (rides the uvm bump).
 1.159  31-Jan-2009  yamt branches: 1.159.2;
settime1: fix a bug i introduced when i made l_stime use monotonic time.
from Matthias Drochner on tech-kern@. PR/40511 from Martin Husemann.
 1.158  30-Jan-2009  ad timer_intr: hold proc_lock across the loop, otherwise the process we are
about to signal could disappear.
 1.157  11-Jan-2009  christos - fix leaked lock, thanks ad@ for noticing.
- remove unneeded cast.
 1.156  11-Jan-2009  christos merge christos-time_t
 1.155  16-Oct-2008  wrstuden branches: 1.155.2; 1.155.4;
Adjust locking on the sadata::sa_vps list. The main time we
walk the list, we're looking for a vp to do something with. We do
this in the signal code and in the timer code. The signal code already
runs with proc::p_lock held, so it's a very natural lock to use. The
timer code, however, calls into the sa timer code with a spinlock held.
Since proc::p_lock is an adaptable mutex, we can sleep to get it. Sleeping
with a spinlock is BAD. So proc::p_lock is _not_ the right lock there,
and something like sadata::sa_mutex would be best.

Address this difficulty by noting that both uses actually just read
the list. Changing the list of VPs is rare - once one's added, it stays
until the process ends. So make the locking protocol that to write the
list you have to hold both proc::p_lock and sadata::sa_mutex (taken
in that order). Thus holding either one individually grants read access.

This removes a case where we could sleep with timer_lock, a spinlock at
IPL_SCHED (!!), while trying to get p_lock. If that ever happened, we'd
pretty much be dead. So don't do that!

This fixes a merge botch from how I handled our gaining p_lock - p_lock
should not have simply replaced p_smutex.

While here, tweak the sa_unblock_userret() code for the case
when the blessed vp is actually running (on another CPU). Make its
resched RESCHED_IMMED so we whack the CPU. Addresses a hang I've
observed in starting firefox on occasion when I see one thread running
in userland and another thread sitting in lwpublk, which means it's on
the list of threads for which we need an unblocked upcall. This list is
one on which things should NOT linger.
 1.154  15-Oct-2008  wrstuden Merge wrstuden-revivesa into HEAD.
 1.153  25-Sep-2008  pooka Split rate-checking routines into their own module for easier reuse.
 1.152  23-Sep-2008  christos fix half-assed change usec -> nsec that broke non-real timers.
 1.151  08-Aug-2008  christos Fix broken setitimer(). (Sverre Froyen)
 1.150  15-Jul-2008  christos Use more timespecs internally. From Alexander Shishkin and me.
Welcome to 4.99.70, 30 more to go for 100.
 1.149  08-Jul-2008  christos Fix to bug reported and tested by Alexander Shishkin. struct ptimer has
a union that contains either a callout [for CLOCK_REALTIME] or a flag
and a list [for other clock types]. Make sure we perform the right actions
on the right union member depending on the clock type. Otherwise this would
result in crashes.
 1.148  29-May-2008  joerg branches: 1.148.2; 1.148.4;
Explicitly compute the next interval using 64bit arithmetic, if the time
was either stepped backwards or the timer has overflown. This fixes
PR 26470.
 1.147  08-May-2008  ad - Add tc_gonebad(): allows timecounter to be flagged as bad and removed at
the next clock tick.
- Remove time_lock, which is no longer required.
 1.146  28-Apr-2008  martin branches: 1.146.2;
Remove clause 3 and 4 from TNF licenses
 1.145  24-Apr-2008  ad branches: 1.145.2;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.144  22-Apr-2008  ad Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.
 1.143  21-Apr-2008  ad Make ntp, pmc, reboot, sysarch, time syscalls MPSAFE.
 1.142  21-Apr-2008  ad timer fixes for PR 37093:

- Fix serious concurrency problems, making the code MT and MP safe in
the process.
- Don't allocate memory or inspect process state from hardclock().
 1.141  25-Feb-2008  yamt branches: 1.141.2; 1.141.4;
nanosleep1: handle kpause spontaneous wakeups.
 1.140  19-Feb-2008  yamt branches: 1.140.2; 1.140.6;
wrap long lines. no functional change.
 1.139  19-Feb-2008  yamt nanosleep1: whitespace. no functional change.
 1.138  20-Jan-2008  joerg Now that __HAVE_TIMECOUNTER and __HAVE_GENERIC_TODR are invariants,
remove the conditionals and the code associated with the undef case.
 1.137  22-Dec-2007  yamt use binuptime for l_stime/l_rtime.
 1.136  22-Dec-2007  yamt reduce #ifdef __HAVE_TIMECOUNTER.
 1.135  20-Dec-2007  dsl Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
 1.134  08-Dec-2007  elad branches: 1.134.4;
Replace usage of p_cred in kauth(9) call with kauth_cred_get().

okay yamt@.
 1.133  25-Nov-2007  elad branches: 1.133.2;
Kill a KAUTH_REQ_SYSTEM_TIME_SYSTEM request that's no longer needed.
 1.132  25-Nov-2007  elad Refactor time modification checks and place them in the secmodel code.

okay christos@
 1.131  15-Nov-2007  ad Add a bit of locking around timecounter attachment / selection.
 1.130  19-Oct-2007  ad branches: 1.130.2;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.129  08-Oct-2007  ad branches: 1.129.2;
Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.
 1.128  09-Aug-2007  pooka branches: 1.128.2; 1.128.4;
Shuffle routines which just roll values around from kern_clock.c
and kern_time.c to subr_time.c.
 1.127  07-Aug-2007  ad No reason not to make itimespecfix() generally available..
 1.126  07-Aug-2007  ad Export itimespecfix() until itimerfix() dies.
 1.125  09-Jul-2007  ad branches: 1.125.2; 1.125.6;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.124  21-May-2007  christos rename si_sigval -> si_value to match POSIX RTS.
 1.123  13-May-2007  dsl nanosleep1() shouldn't try to get the current time into a NULL address.
 1.122  13-May-2007  dsl Instead of the #define versions of tc_getfrequency() and nanouptime(), use
the function ones in kern_kern_clock.c (adding tc_getfrequency).
Adjust includes so this builds.
 1.121  13-May-2007  dsl Add a #define for nanouptime() in the !__HAVE_TIMECOUNTERS case.
 1.120  13-May-2007  dsl Split sys_nanosleep().
 1.119  12-May-2007  dsl Change interface to settimeofday1() so that it can also be used from
compat code in order to avoid the stackgap.
 1.118  12-Mar-2007  ad branches: 1.118.2; 1.118.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.117  09-Mar-2007  ad branches: 1.117.2;
- Make the proclist_lock a mutex. The write:read ratio is unfavourable,
and mutexes are cheaper use than RW locks.
- LOCK_ASSERT -> KASSERT in some places.
- Hold proclist_lock/kernel_lock longer in a couple of places.
 1.116  04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.115  22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.114  16-Feb-2007  ad branches: 1.114.2;
Remove spllowersoftclock() and CLKF_BASEPRI(), and always dispatch callouts
via a soft interrupt. In the near future, softclock will be run from process
context.
 1.113  09-Feb-2007  ad Merge newlock2 to head.
 1.112  27-Dec-2006  yamt remove nqnfs.
 1.111  06-Dec-2006  yamt use KSI_INIT rather than memset. no functional changes.
 1.110  01-Nov-2006  yamt remove some __unused from function parameters.
 1.109  20-Oct-2006  elad Add an XXX to remind me why it's there when grepping. (securelevel ref)
 1.108  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.107  25-Sep-2006  christos PR/34612: Bucky Katz: SA returns from sleep do not set the signal flags
Patch applied, many thanks for the example!
 1.106  08-Sep-2006  elad branches: 1.106.2;
First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)
 1.105  23-Jul-2006  ad branches: 1.105.4;
Use the LWP cached credentials where sane.
 1.104  16-Jul-2006  kardel fix another t{s,v}tohz() fallout (invalid remaining time)
now passes regression/sys/kern/sleeping
 1.103  14-Jul-2006  kardel keep NetBSD boottime semantics:
- only set at boot
- only tracking delta of set-time operations
-> will keep boottime stable across ACPI sleeps
uptime(1) will report the time since last boot
 1.102  08-Jul-2006  kardel report true clock resolution based on the frequency information
from the underlying counter in clock_getres(). For frequencies
above 1GHz report a resolution if 1 nsec.
 1.101  07-Jun-2006  kardel branches: 1.101.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.100  18-May-2006  yamt branches: 1.100.2;
timers_alloc: use PR_WAITOK.
 1.99  14-May-2006  elad integrate kauth.
 1.98  05-Dec-2005  christos branches: 1.98.4; 1.98.6; 1.98.8; 1.98.10; 1.98.12;
- make settime take timespec.
- avoid wrapping of time in settime (from OpenBSD)
- pass struct proc down so that we can log a detailed message.
 1.97  26-Nov-2005  simonb Convert malloc/free of struct ptimers to pools.
Move the ptimer pool to kern_time.c to keep like pools together,
and it wasn't used in kern_proc.c
 1.96  11-Nov-2005  simonb branches: 1.96.2;
Call nanotime() directly, instead of doing the
microtime()/TIMEVAL_TO_TIMESPEC() dance.
 1.95  23-Oct-2005  cube Implement a few changes needed to properly resolve PR#30924, as
discussed in the PR.

- introduce sys/timevar.h to hold kernel-specific stuff relevant to
sys/time.h. Ideally, timevar.h would contain all (or almost) of the
#ifdef _KERNEL part of time.h, but that's a pretty big and tedious
change to make. For now, it will contain only the prototypes I
introduced when working on COMPAT_NETBSD32.

- split copyinout_t into copyin_t and copyout_t, it makes prototypes more
explicit about the meaning of a given argument. Suggested by yamt@.

- move copyinout_t definition in sys/time.h to systm.h as copyin_t and
copyout_t

- make everything uses the new types and include the proper headers at
the proper places.
 1.94  02-Oct-2005  chs branches: 1.94.2;
avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.
 1.93  23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.92  23-Jul-2005  cube Split sys_timer_create, sys_timer_gettime and sys_timer_settime so they
can be easily used by netbsd32 code.

XXX Meanwhile, introduce a copyinout_t type that matches the prototype of
XXX copyin(9) and copyout(9). Its logical place would be in systm.h, near
XXX the definition of copyin, but, well, see the comment.
 1.91  11-Jul-2005  cube Split sys_getitimer and sys_setitimer to make it possible to share the
relevant code with the COMPAT_NETBSD32 version, and make the latter use
the new functions.

This fixes netbsd32_setitimer() which had drifted from the native syscall
and did not work properly anymore.
 1.90  23-Jun-2005  thorpej branches: 1.90.2;
Use ANSI function decls. Apply some static.
 1.89  29-May-2005  christos - add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.
 1.88  02-Mar-2005  mycroft branches: 1.88.2;
Copyright maintenance.
 1.87  26-Feb-2005  perry nuke trailing whitespace
 1.86  06-Jan-2005  mycroft branches: 1.86.2; 1.86.4;
If sa_upcall() fails (which is always going to be due to resource exhaustion),
do not leak siginfo structures.

Note that in the cases of trap signals and timer events, losing this
information could be very bad; right now it will cause us to spin until the
process is SIGKILLed.

"Needs work."
 1.85  14-Nov-2004  atatat Wrap TIMEVAL_TO_TIMESPEC and TIMESPEC_TO_TIMEVAL macros in

do { ... } while(/*CONSTCOND*/0)

so that they can be used unadorned in if/else blocks, etc. This means
that you now *have* to put a ; at the end of the "call" to these
macros.
 1.84  27-Apr-2004  simonb Fix "comments within comments" problem pointed out by Geoff Wing on
source-changes.
 1.83  27-Apr-2004  kleink POSIX-2001: Add restrict keywords to gettimeofday(2) and setitimer(2);
further deprecate struct timezone usage by changing `tzp' argument to
gettimeofday() to void *; align utimes(2) declaration by changing `times`
argument from struct timeval * to struct timeval[2]. From Murray
Armfield in PR standards/25331.

In due curse, reflect these changes in futimes(2), lutimes(2), and
settimeofday(2).
 1.82  14-Mar-2004  cl branches: 1.82.2; 1.82.4; 1.82.6;
add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP
 1.81  02-Jan-2004  cl kernel part of no-syscall upcall stack return: libpthread registers
an offset between ss_sp and struct sa_stackinfo_t (located in struct
__pthread_st) when calling sa_register. The kernel increments the
sast_gen counter in struct sastack when an upcall stack is used.
libpthread increments the sasi_stackgen counter in struct
sa_stackinfo_t when an upcall stack is freed. The kernel compares the
two counters to decide if a stack is free or in use.

- add struct sa_stackinfo_t with sasi_stackgen to count stack use in
userland
- add sast_gen to struct sastack to count stack use in kernel
- add SA_FLAG_STACKINFO to enable the stackinfo_offset argument in the
sa_register syscall
- add sa_stackinfo_offset to struct sadata for offset between ss_sp
and struct sa_stackinfo_t
- add ssize_t stackinfo_offset argument to sa_register, initialize
struct sadata's sa_stackinfo_offset from it if SA_FLAG_STACKINFO is
set
- add sa_getstack, sa_getstack0, sa_stackused and sa_setstackfree
functions to find/use/free upcall stacks and use these where
appropriate
- don't record stack for upcall in sa_upcall0
- pass sau to sa_switchcall instead of l2 (l2 = curlwp in sa_switchcall)
- add sa_vp_blocker to struct sadata to pass recently blocked lwp to
sa_switchcall
- delay finding a stack for blocked upcalls to sa_switchcall
- add sa_stacknext to struct sadata pointing to next most likely free
upcall stack; also g/c sa_stackslist in struct sadata and sast_list
in struct sastack
- add L_SA_WOKEN flag: LWP is on sa_woken queue
- add L_SA_RECYCLE flag: LWP should be recycled in sa_setwoken
- replace l_upcallstack with L_SA_WOKEN/L_SA_RECYCLE/L_SA_BLOCKING
flags
- g/c now unused sast_blocker in struct sastack
- make sa_switchcall, sa_upcall0 and sa_upcall_getstate static in
kern_sa.c
- call sa_upcall_userret only once in userret
- split sa_makeupcalls out of sa_upcall_userret and use to process
the sa_upcalls queue
- on process exit: mark LWPs sleeping in saunblock interruptible; also
there are no LWPs sleeping on l->l_upcallstack anymore; also clear
sa_wokenq_head to prevent unblocked upcalls

additional changes:
- cleanup timerupcall sa_vp == curlwp check
- add check in sa_yield if we didn't block on our way here and we
wouldn't any longer be the LWP on the VP
- invalidate sa_vp_ofaultaddr after resolving pagefault
 1.80  02-Dec-2003  christos PR/23613: Christian Biere: Bogus bounds check in nanosleep.
 1.79  13-Nov-2003  chs eliminate uvm_useracc() in favor of checking the return value of
copyin() or copyout().

uvm_useracc() tells us whether the mapping permissions allow access to
the desired part of an address space, and many callers assume that
this is the same as knowing whether an attempt to access that part of
the address space will succeed. however, access to user space can
fail for reasons other than insufficient permission, most notably that
paging in any non-resident data can fail due to i/o errors. most of
the callers of uvm_useracc() make the above incorrect assumption. the
rest are all misguided optimizations, which optimize for the case
where an operation will fail. we'd rather optimize for operations
succeeding, in which case we should just attempt the access and handle
failures due to insufficient permissions the same way we handle i/o
errors. since there appear to be no good uses of uvm_useracc(), we'll
just remove it.
 1.78  02-Nov-2003  cl Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.
 1.77  08-Oct-2003  thorpej * Shuffle some data structures so, and add a flags word to ksiginfo_t.
Right now the only flag is used to indicate if a ksiginfo_t is a
result of a trap. Add a predicate macro to test for this flag.
* Add initialization macros for ksiginfo_t's.
* Add accssor macro for ksi_trap. Expands to 0 if the ksiginfo_t was
not the result of a trap. This matches the sigcontext trapcode semantics.
* In kpsendsig(), use KSI_TRAP_P() to select the lwp that gets the signal.
Inspired by Matthias Drochner's fix to kpsendsig(), but correctly handles
the case of non-trap-generated signals that have a > 0 si_code.

This patch fixes a signal delivery problem with threaded programs noted by
Matthias Drochner on tech-kern.

As discussed on tech-kern. Reviewed and OK's by Christos.
 1.76  14-Sep-2003  christos set the sigval in the setitimer case.
 1.75  13-Sep-2003  christos enable SI_TIMER notification.
 1.74  09-Sep-2003  cl fix timerupcall breakage after SA_SIGINFO changes:
- sa_upcall only stores a pointer to the `arg'
 1.73  06-Sep-2003  christos SA_SIGINFO changes.
 1.72  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.71  17-Jul-2003  fvdl Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.
 1.70  28-May-2003  nathanw branches: 1.70.2;
Expand the test in itimerfire() to only wake up an idle SA LWP if the
process isn't stopped.
 1.69  19-May-2003  dyoung Make ppsratecheck conform with its man page, which says, "If maxpps
is set to 0, the function will always return 0 (no packets/events
are permitted)." Before this patch, ppsratecheck returned 1 once
a second when maxpps was 0.
 1.68  16-Apr-2003  dsl Flag that adjtime has happened, hardware RTC might be wrong
 1.67  10-Mar-2003  nathanw Change the contract for timer_settime() (the internal routine, not the
syscall sys_timer_settime()) to take an absolute value for realtime
timers. This avoids a pair of gratiuitous conversions with the
possibility that the timer's intermediate value would be 0.0, which
would signal timer_settime() to cancel the timer.

Adjust callers of timer_settime() to compensate; catch the case where
sys_timer_settime() with an absolute time value of now and a virtual
timer would also be subtracted down to a timer-cancelling 0.0.

This should fix the bug seen in libpthread's nanosleep() where certain
applications, such as xmms, would wedge with unexpired userlevel
alarms.
 1.66  04-Feb-2003  jdolecek itimerfire(): fix bug in previous - if two or more timers would
fire close together, the second (and every other) timer would be
added to mask incorrectly - timerid value would be shifted twice,
and sa_upcall() would later kill process with SIGILL
 1.65  04-Feb-2003  jdolecek cosmetic - use type 'timer_t' for timerid local in sys_timer_create()
and sys_timer_delete()
 1.64  03-Feb-2003  nathanw Prevent one timer from overrunning another with the current userret
mechanism by keeping a list (bitset) of which timers have fired and using
that list in the upcall (Does this sound familiar? SEND HELP NEED SIGINFO).

Provoke the idle LWP into running again with setrunnable(sa->sa_idle)
instead of a wakeup() call, since we know what it is.
 1.63  18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.62  22-Oct-2002  simonb "oatv" in adjtime1() isn't used after being set; remove it.
 1.61  31-Jan-2002  simonb branches: 1.61.10;
Implement the CLOCK_MONOTONIC clock for the posix/opengroup realtime
clock_() functions. This simply returns the kernel mono_time variable.
As discussed on tech-kern.
 1.60  09-Dec-2001  manu Changed clocktl interface to use syscallargs structures
 1.59  13-Nov-2001  christos PR/8657: z@rentaboat.se: alarm takes more seconds that it can handle.

This is a followup to PR/14558.

- itimerfix(9) limited the number of seconds to 100M, before I changed
it to 1000M for PR/14558.
- nanosleep(2) documents a limit of 1000M seconds.
- setitimer(2), select(2), and other library functions that indirectly
use setitimer(2) for example alarm(3) don't specify a limit.

So it only seems appropriate that any positive number of seconds in
struct timeval should be accepted by any code that uses itimerfix(9)
directly, except nanosleep(2) which should check for 1000M seconds
manually. This changes makes the manual pages of select(2), nanosleep(2),
setitimer(2), and alarm(3) consistent with the code.
 1.58  12-Nov-2001  lukem add RCSIDs
 1.57  12-Nov-2001  christos PR/14558: Tero Kivinen: There is no point in limiting the number of seconds
to 100 million. Use 1000 million like the man page for nanosleep suggests.
This is much closer to MAXINT, and it conforms to POSIX.
 1.56  16-Sep-2001  manu branches: 1.56.2;
Split root-only time-related system calls so that we have an upper part, that
checks root privs, and a lower part that does the actual job. The lower part
will be called by the upcoming clockctl driver. Approved by Christos
Also fixed a few cosmetic things
 1.55  11-Jun-2001  tron branches: 1.55.2; 1.55.4;
Lower interrupt priortiy properly if setting the setting the kernel time
is denied in a securelevel above 1. This fixes PR kern/13158.
 1.54  19-Sep-2000  bjh21 branches: 1.54.2;
Extend NFS_V2_ONLY to remove NQNFS lease support as well. Saves another 10k.
 1.53  02-Aug-2000  itojun allow admins to disable pps rate limitation, by setting "maxpps"
parameter to negative value.
 1.52  13-Jul-2000  thorpej Add a comment about the hzto() return value.
 1.51  09-Jul-2000  jhawk Comment police. s/DIAGNOSTICS/DIAGNOSTIC/
 1.50  09-Jul-2000  itojun add ppsratelimit(9), which does event-per-sec rate limitation.
use it from icmp6 error rate limitation code.
XXX better name for the function?
 1.49  09-Jul-2000  itojun shorten splclock() period in ratelimit().
From: onoe
 1.48  27-Jun-2000  mrg remove include of <vm/vm.h>
 1.47  31-May-2000  thorpej branches: 1.47.2;
Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.
 1.46  26-May-2000  thorpej branches: 1.46.2;
First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.
 1.45  30-Mar-2000  augustss Get rid of register declarations.
 1.44  23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.43  16-Feb-2000  itojun correct ratecheck() signedness. without this fix, ratecheck() will never
success again after first success with lasttime=(0,0).
 1.42  03-Feb-2000  cgd Implement ratecheck(), a function which can help programmers implement
rate-limited actions. See ratecheck(9) for details of its use.
 1.41  10-Oct-1999  hwr branches: 1.41.2;
If time delta is larger than thresh. Use 10* adj factor. Make this
work for negative deltas too. From NAKAJIMA Yoshihiro <nakayosh@kcn.ne.jp>
in kern/8589.
 1.40  16-Aug-1999  tron branches: 1.40.2;
Remove the prototype for settime(), it is in "sys/time.h" now.
 1.39  16-Aug-1999  tron Make settime() public because we need to use it for the Linux emulation.
 1.38  05-Aug-1999  thorpej Change the semantics of splsoftclock() to be like other spl*() functions,
that is priority is rasied. Add a new spllowersoftclock() to provide the
atomic drop-to-softclock semantics that the old splsoftclock() provided,
and update calls accordingly.

This fixes a problem with using the "rnd" pseudo-device from within
interrupt context to extract random data (e.g. from within the softnet
interrupt) where doing so would incorrectly unblock interrupts (causing
all sorts of lossage).

XXX 4 platforms do not have priority-raising capability: newsmips, sparc,
XXX sparc64, and VAX. This platforms still have this bug until their
XXX spl*() functions are fixed.
 1.37  07-Jun-1999  thorpej Make sure `olddelta' is a valid pointer before performing the guts of
the adjtime(2) system call. Fixes PR #7721, Darren Reed.
 1.36  18-Aug-1998  thorpej branches: 1.36.6; 1.36.8; 1.36.10;
Add some braces to make egcs happy (ambiguous else warning).
 1.35  31-Jul-1998  perry fix sizeofs so they comply with the KNF style guide. yes, it is pedantic.
 1.34  25-Jun-1998  thorpej branches: 1.34.2;
defopt NFSSERVER
 1.33  01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.32  20-Feb-1998  mycroft Fix missing newline in time zone warning.
 1.31  19-Feb-1998  thorpej Include the NFS option header.
 1.30  15-Oct-1997  mycroft Adjust u_int arguments of some system calls to int, to match user-level
prototypes.
 1.29  26-Apr-1997  tls Don't allow the time to be set backwards if in highly secure mode, since this would allow inode change times to be manipulated.
 1.28  21-Apr-1997  jtc In nanosleep, use local error variable when storing rqtp in so that
an error from a failed tsleep will still be returned to the caller.
 1.27  16-Apr-1997  jtc Add POSIX.1b nanosleep().
 1.26  31-Jan-1997  thorpej NFSCLIENT -> NFS
 1.25  15-Jan-1997  perry Eliminate obsolete TIMEZONE and DST options.
Eliminate obsolete global kernel variable "struct timezone tz"
Add RTC_OFFSET option
Add global kernel variable rtc_offset, which is initialized by
RTC_OFFSET at kernel compile time.
on i386, x68k, mac68k, pc532 and arm32, RTC_OFFSET indicates how many
minutes west (east) of GMT the hardware RTC runs. Defaults to 0.
Places where tz variable was used to indicate this in the past have
been replaced with rtc_offset.
Add sysctl interface to rtc_offset.
Kill obsolete DST_* macros in sys/time.h
gettimeofday now always returns zeroed timezone if zone is requested.
settimeofday now ignores and logs attempts to set non-existant kernel
timezone.
 1.24  22-Dec-1996  cgd branches: 1.24.2;
* catch up with system call argument type fixups/const poisoning.
* Fix arguments to various copyin()/copyout() invocations, to avoid
gratuitous casts.
* Some KNF formatting fixes
 1.23  15-Nov-1996  cgd clean up a few spaces vs. tabs and KNF bogons. Make this compile
cleanly with -Wall -Wstrict-prototypes -Wmissing-prototypes -Wcast-qual.
 1.22  15-Nov-1996  jtc Add clock_gettime, clock_settime, and clock_getres
 1.21  24-Oct-1996  cgd replace a construction in sys_setitimer() that was too tricky for its
(and my!) own good with a more straightforward one that is equally (and
more apparently) correct.
 1.20  18-Feb-1996  fvdl branches: 1.20.4;
Changes for NVSv3 code: pull in more NFS include files into kern_time.c
to get types right (overkill for just one function call, but oh well).
Clear B_NEEDCOMMIT in bdwrite().
 1.19  13-Feb-1996  christos uipc_proto.c: No need for the forward decls anymore; everything is prototyped.
kern_time.c: add header to get the NFS prototypes if needed.
 1.18  09-Feb-1996  christos More proto fixes
 1.17  04-Feb-1996  christos First pass at prototyping
 1.16  07-Oct-1995  mycroft Prefix names of system call implementation functions with `sys_'.
 1.15  19-Sep-1995  thorpej Make system calls conform to a standard prototype and bring those
prototypes into scope.
 1.14  21-Mar-1995  mycroft Update to use timer{add,sub}().
 1.13  13-Dec-1994  mycroft LEASE_UPDATETIME -> lease_updatetime
 1.12  11-Dec-1994  mycroft Use __timer{add,sub}(), not timeval{add,sub}(). Remove the latter completely.
 1.11  20-Oct-1994  cgd update for new syscall args description mechanism
 1.10  18-Sep-1994  mycroft Remove extern of tickadj.
 1.9  29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.8  20-May-1994  cgd update to Lite
 1.7  05-May-1994  mycroft Remove now-bogus casts.
 1.6  05-May-1994  cgd lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.
 1.5  18-Dec-1993  mycroft Canonicalize all #includes.
 1.4  13-Jul-1993  cgd branches: 1.4.4;
break args structs out, into syscallname_args structs, so gcc2 doesn't
whine so much.
 1.3  27-Jun-1993  andrew ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.
 1.2  20-May-1993  cgd add $Id$ strings, and clean up file headers where necessary
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.4.4.1  14-Nov-1993  mycroft Canonicalize all #includes.
 1.20.4.1  26-Jan-1997  rat Pullup 1.20 -> 1.21 (code clarification)
 1.24.2.2  18-Jan-1997  thorpej Update from trunk.
 1.24.2.1  14-Jan-1997  thorpej Snapshot of work-in-progress, committed to private branch.

These changes implement machine-independent root device and file system
selection. Notable features:

- All ports behave in a consistent manner regarding root
device selection.
- No more "options GENERIC"; all kernels have the ability
to boot with RB_ASKNAME to select root device and file system
type.
- Root file system type can be wildcarded; a machine-independent
function will try all possible file systems for the selected
root device until one succeeds.
- If the root file system fails to mount, the operator will
be given the chance to select a new root device and file
system type, rather than having the machine simply panic.
- nfs_mountroot() no longer panics if any part of the NFS
mount process fails; it now returns an error, giving the
operator a chance to recover.
- New, more consistent, config(8) grammar. The constructs:

config netbsd swap generic
config netbsd root on nfs

have been replaced with:

config netbsd root on ? type ?
config netbsd root on ? type nfs

Additionally, the operator may select or wildcard root file
system type in the kernel configuration file:

config netbsd root on cd0a type cd9660

config(8) now requires that a "root" specification be
made. "root" may be wired down or wildcarded. "swap" and
"dump" specifications are optional, and follow previous
semantics.

- config(8) has a new "file-system" keyword, used to configure
file systems into the kernel. Eventually, this will be used
to generate the default vfssw[].

- "options NFSCLIENT" is obsolete, and is replaced by
"file-system NFS". "options NFSSERVER" still exists, since
NFS server support is independent of the NFS file system
client.

- sys/arch/<foo>/<foo>/swapgeneric.c is no longer used, and
will be removed; all information is now generated by config(8).

As of this commit, all ports except arm32 have been updated to use
the new setroot(). Only SPARC, i386, and Alpha ports have been
tested at this time. Port masters should test these changes on their
ports, and report any problems back to me.

More changes are on their way, including RB_ASKNAME support in
nfs_mountroot() (to prompt for server address and path) and, potentially,
the ability to select rarp/bootparam or bootp in nfs_mountroot().
 1.34.2.1  08-Aug-1998  eeh Revert cdevsw mmap routines to return int.
 1.36.10.1  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.36.8.1  21-Jun-1999  thorpej Sync w/ -current.
 1.36.6.3  18-Feb-2000  he Pull up revisions 1.42-1.43 (requested by thorpej):
Implement ratecheck(), a function which can help kernel programmers
implement rate-limited actions.
 1.36.6.2  20-Oct-1999  he Pull up revision 1.41 (requested by hwr):
If time delta is larger than thresh, use 10* adjustment factor.
Make this work for negative deltas too. Fixes PR#8589.
 1.36.6.1  18-Jun-1999  perry pullup 1.36->1.37 (thorpej): Make sure "olddelta" is a valid pointer
 1.40.2.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.41.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.46.2.1  22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.47.2.3  11-Jun-2001  he Pull up revision 1.55 (requested by tron):
Lower interrupt priority properly in the error path of settime(),
e.g. when it's denied due to securelevel being above 1. Fixes
PR#13158.
 1.47.2.2  16-Aug-2000  itojun pullup (approved by releng-1-5)

add ppsratecheck(9).

distrib/sets/lists/comp/mi 1.251 -> 1.252
share/man/man9/Makefile 1.67 -> 1.68 (equivalent to)
share/man/man9/ppsratecheck.9 (new)
sys/kern/kern_time.c 1.49 -> 1.51, 1.52 -> 1.53
sys/sys/time.h 1.29 -> 1.30
 1.47.2.1  13-Jul-2000  thorpej Pull up rev. 1.52:
Add a comment about the hzto() return value.
 1.54.2.24  11-Dec-2002  thorpej Sync with HEAD.
 1.54.2.23  03-Dec-2002  nathanw Get the kernel lock in the userret timer upcall callback.
 1.54.2.22  11-Nov-2002  nathanw Catch up to -current
 1.54.2.21  09-Nov-2002  nathanw Separately track timer overruns presently occurring, and timer
overruns associated with the most recent successful delivery.
 1.54.2.20  27-Oct-2002  nathanw Clean up some trailing whitespace.
 1.54.2.19  03-Oct-2002  nathanw Implement CLOCK_VIRTUAL and CLOCK_PROF support for POSIX timers. Factor
out some common code between POSIX timers and BSD timers along the way.
 1.54.2.18  26-Sep-2002  nathanw Pass SA_UPCALL_DEFER to sa_upcall() in realtimerupcall().
Instead of incrementing pt_overruns when sa_upcall() fails, leave
p_userret set so that it gets to try again. Avoids dropping timer upcalls
in some situations.
 1.54.2.17  30-Aug-2002  nathanw In realtimerupcall, check the return value of sa_upcall() for errors,
and increment the overrun count if there is an error.
 1.54.2.16  17-Jul-2002  nathanw Use sa_idle information to simplify timer upcalls, and always use the
upcallret hook to avoid calling sa_upcall, alloc, cpu_setfunc, and all
that garbage from inside an asynchronous event.
 1.54.2.15  12-Jul-2002  nathanw Make timers_free() take another argument indicating whether it should
free all the timers (as on process teardown) or just the POSIX timers
(as on exec).
 1.54.2.14  12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.54.2.13  12-Apr-2002  nathanw Actually track the LWP that should be running on the process's "virtual
processor" (sa->sa_vp, which will become an actual data structure for MP
purposes).

Use this to determine what was interrupted in the re-BLOCKING case and the
unblocked case.
 1.54.2.12  02-Apr-2002  nathanw - Centralize p_nrlwps handling in those functions which actually
set the LWP state to LSRUN. Simplifies matters considerably.

- Trying to keep track of the preempted LWP was a bad idea; go back
to searching for now.

- Send a PREEMPTED upcall from proc_unstop(), so that stopped processes
know that something happened, and so that all runnable LWPs of a unstopped
process have an upcall to deliver (Ideally, the LWP that was runnable
when the process was stopped should return first, and any LWPs that were
woken up while the process was stopped would interrupt it, but that's
difficult to arrange).
 1.54.2.11  28-Feb-2002  nathanw Catch up to -current.
 1.54.2.10  19-Feb-2002  nathanw When making a timer-expiration upcall, copy out a siginfo_t, not a struct
sigevent.
 1.54.2.9  04-Feb-2002  nathanw Get the scheduler lock around SA cache manipulation and setrunnable()(!).
 1.54.2.8  02-Feb-2002  nathanw Create a new activation to handle a timer expiration if nothing is running.
 1.54.2.7  28-Jan-2002  nathanw Don't refuse a SIGEV_SA sigevent registration for timer_create() for a
process that doesn't have P_SA set; it may be setting up the timer
before enabling SA's. Instead, check for P_SA before invoking the
upcall in realtimerexpire().
 1.54.2.6  08-Jan-2002  nathanw Catch up to -current.
 1.54.2.5  17-Nov-2001  nathanw Implement POSIX realtime timers, and reimplement getitimer() and setitimer()
in terms of them.
 1.54.2.4  14-Nov-2001  nathanw Catch up to -current.
 1.54.2.3  21-Sep-2001  nathanw Catch up to -current.
 1.54.2.2  21-Jun-2001  nathanw Catch up to -current.
 1.54.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.55.4.1  01-Oct-2001  fvdl Catch up with -current.
 1.55.2.2  11-Feb-2002  jdolecek Sync w/ -current.
 1.55.2.1  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.56.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.61.10.2  07-Dec-2005  tron Fix build problem caused by ticket #5952.
 1.61.10.1  06-Dec-2005  tron Apply patch (requested by christos in ticket #5966):
Avoid time wrap when setting the system time.
 1.70.2.8  11-Dec-2005  christos Sync with head.
 1.70.2.7  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.70.2.6  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.70.2.5  17-Jan-2005  skrll Sync with HEAD.
 1.70.2.4  29-Nov-2004  skrll Sync with HEAD.
 1.70.2.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.70.2.2  18-Sep-2004  skrll Sync with HEAD.
 1.70.2.1  03-Aug-2004  skrll Sync with HEAD
 1.82.6.2  07-Dec-2005  tron Fix build problem cause by ticket #10184.
 1.82.6.1  06-Dec-2005  tron Apply patch (requested by christos in ticket #10184):
Avoid time wrap when setting the system time.
 1.82.4.2  07-Dec-2005  tron Fix build problem caused by ticket #10184.
 1.82.4.1  06-Dec-2005  tron Fix build problem caused by ticket #10184.
 1.82.2.2  07-Dec-2005  tron Fix build problem caused by ticket #10184.
 1.82.2.1  06-Dec-2005  tron Apply patch (requested by christos in ticket #10184):
Avoid time wrap when setting the system time.
 1.86.4.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.86.2.1  29-Apr-2005  kent sync with -current
 1.88.2.3  07-Dec-2005  tron Fix build problem caused by ticket #1031.
 1.88.2.2  06-Dec-2005  tron Apply patch (requested by christos in ticket #1031):
Avoid time wrap when setting the system time.
 1.88.2.1  21-Oct-2005  riz Pull up following revision(s) (requested by chs in ticket #901):
sys/kern/kern_time.c: revision 1.94
sys/sys/signalvar.h: revision 1.59
sys/sys/savar.h: revision 1.16
sys/kern/kern_sig.c: revision 1.209
sys/kern/kern_sa.c: revision 1.66
sys/kern/kern_synch.c: revision 1.150
avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.
clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.
 1.90.2.8  27-Feb-2008  yamt sync with head.
 1.90.2.7  21-Jan-2008  yamt sync with head
 1.90.2.6  07-Dec-2007  yamt sync with head
 1.90.2.5  27-Oct-2007  yamt sync with head.
 1.90.2.4  03-Sep-2007  yamt sync with head.
 1.90.2.3  26-Feb-2007  yamt sync with head.
 1.90.2.2  30-Dec-2006  yamt sync with head.
 1.90.2.1  21-Jun-2006  yamt sync with head.
 1.94.2.1  26-Oct-2005  yamt sync with head
 1.96.2.1  29-Nov-2005  yamt sync with head.
 1.98.12.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.98.10.3  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.98.10.2  10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.98.10.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.98.8.4  14-Sep-2006  yamt sync with head.
 1.98.8.3  11-Aug-2006  yamt sync with head
 1.98.8.2  26-Jun-2006  yamt sync with head.
 1.98.8.1  24-May-2006  yamt sync with head.
 1.98.6.3  01-Jun-2006  kardel Sync with head.
 1.98.6.2  28-Feb-2006  kardel adjtime support for both __HAVE_TIMECOUNTER variants
 1.98.6.1  04-Feb-2006  simonb Adapt for timecounters, borrowing some FreeBSD code in the process.
XXX: ifdef hell while supporting both timecounter and non-timecounter
cases.
 1.98.4.1  09-Sep-2006  rpaulo sync with head
 1.100.2.1  19-Jun-2006  chap Sync with head.
 1.101.2.1  13-Jul-2006  gdamore Merge from HEAD.
 1.105.4.9  30-Jan-2007  ad Remove support for SA. Ok core@.
 1.105.4.8  28-Jan-2007  ad - Remove the last use of mtsleep()
- sched_pause() -> kpause()
 1.105.4.7  12-Jan-2007  ad Sync with head.
 1.105.4.6  11-Jan-2007  ad Checkpoint work in progress.
 1.105.4.5  29-Dec-2006  ad Checkpoint work in progress.
 1.105.4.4  18-Nov-2006  ad Sync with head.
 1.105.4.3  17-Nov-2006  ad Checkpoint work in progress.
 1.105.4.2  21-Oct-2006  ad Checkpoint work in progress on locking and per-LWP signals. Very much a
a work in progress and there is still a lot to do.
 1.105.4.1  11-Sep-2006  ad - Convert some lockmgr() locks to mutexes and RW locks.
- Acquire proclist_lock and p_crmutex in some obvious places.
 1.106.2.2  10-Dec-2006  yamt sync with head.
 1.106.2.1  22-Oct-2006  yamt sync with head
 1.114.2.4  17-May-2007  yamt sync with head.
 1.114.2.3  24-Mar-2007  yamt sync with head.
 1.114.2.2  12-Mar-2007  rmind Sync with HEAD.
 1.114.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.117.2.7  05-Nov-2007  ad nanoslp -> sleep
 1.117.2.6  23-Oct-2007  ad Sync with head.
 1.117.2.5  20-Aug-2007  ad Sync with HEAD.
 1.117.2.4  14-Jul-2007  ad Make it possible to track time spent by soft interrupts as is done for
normal LWPs, and provide a sysctl to switch it on/off. Not enabled by
default because microtime() is not free. XXX Not happy with this but
I want it get it out of my local tree for the time being.
 1.117.2.3  01-Jul-2007  ad Adapt to callout API change.
 1.117.2.2  08-Jun-2007  ad Sync with head.
 1.117.2.1  13-Mar-2007  ad Sync with head.
 1.118.4.1  09-Dec-2007  reinoud Pullup to HEAD
 1.118.2.1  11-Jul-2007  mjf Sync with head.
 1.125.6.6  09-Dec-2007  jmcneill Sync with HEAD.
 1.125.6.5  27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.125.6.4  21-Nov-2007  joerg Sync with HEAD.
 1.125.6.3  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.125.6.2  16-Aug-2007  jmcneill Sync with HEAD.
 1.125.6.1  09-Aug-2007  jmcneill Sync with HEAD.
 1.125.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.128.4.1  14-Oct-2007  yamt sync with head.
 1.128.2.3  23-Mar-2008  matt sync with HEAD
 1.128.2.2  09-Jan-2008  matt sync with HEAD
 1.128.2.1  06-Nov-2007  matt sync with HEAD
 1.129.2.2  18-Nov-2007  bouyer Sync with HEAD
 1.129.2.1  25-Oct-2007  bouyer Sync with HEAD.
 1.130.2.4  18-Feb-2008  mjf Sync with HEAD.
 1.130.2.3  27-Dec-2007  mjf Sync with HEAD.
 1.130.2.2  08-Dec-2007  mjf Sync with HEAD.
 1.130.2.1  19-Nov-2007  mjf Sync with HEAD.
 1.133.2.2  26-Dec-2007  ad Sync with head.
 1.133.2.1  08-Dec-2007  ad Sync with head.
 1.134.4.2  23-Jan-2008  bouyer Sync with HEAD.
 1.134.4.1  02-Jan-2008  bouyer Sync with HEAD
 1.140.6.4  17-Jan-2009  mjf Sync with HEAD.
 1.140.6.3  28-Sep-2008  mjf Sync with HEAD.
 1.140.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.140.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.140.2.1  24-Mar-2008  keiichi sync with head.
 1.141.4.2  04-Jun-2008  yamt sync with head
 1.141.4.1  18-May-2008  yamt sync with head.
 1.141.2.3  27-Nov-2008  christos delete unused variable (and don't use it!)
 1.141.2.2  01-Nov-2008  christos Sync with head.
 1.141.2.1  29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.145.2.5  11-Aug-2010  yamt sync with head.
 1.145.2.4  11-Mar-2010  yamt sync with head
 1.145.2.3  16-Sep-2009  yamt sync with head
 1.145.2.2  04-May-2009  yamt sync with head.
 1.145.2.1  16-May-2008  yamt sync with head.
 1.146.2.11  10-Oct-2008  skrll Sync with HEAD.
 1.146.2.10  25-Sep-2008  wrstuden Hack a merge to the "new" wrstuden-revivesa-base-3 tag for this file.
It turns out the problems with regression tests were really with
VIRTUAL timers. The timers that the round-robin scheduling in libpthread
uses to switch stuff around. Fixing that make the "kill1" regression
test complete in a timely manner. So pull in the rev
between 1.151 and 1.152, which now lines us up with our new base.
 1.146.2.9  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.146.2.8  21-Jul-2008  wrstuden Add support for compiling SA as an option. Implied by COMPAT_40.

i386 kernels both with COMPAT_40 and with no compat options (and thus
no SA) compile.

No functional changes intended.
 1.146.2.7  30-Jun-2008  wrstuden lwp_need_userret() needs the lpw locked.
 1.146.2.6  30-Jun-2008  wrstuden Change how we make SA threads not generate upcalls. Instead of clearing
LW_SA, use a private flag, LP_SA_NOBLOCK, that we set when we want
to not generate upcalls. This means we do NOT need to lock (l)
(ourselves) to set it.

Adjust tests that look at LW_SA. Now, we are an upcall-generating
lwp if ((l->l_flag & LW_SA) && (~l->l_pflag & LP_SA_NOBLOCK)).

Introduce code pattern to set & remember this:

f = ~l->l_pflag & LP_SA_NOBLOCK;
l->l_pflag |= LP_SA_NOBLOCK;

...

/* f is now LP_SA_NOBLOCK if it wasn't set in l_pflag before */

l->l_pflag ^= f;

I updated a lot of the trap handlers to do trap handling iff LP_SA_NOBLOCK
is not set. I tried to figure out if the trap handler could be triggered
for user-based faults as opposed to kernel faults to user addresses, and
only look at LP_SA_NOBLOCK for the latter.

Above is a result of discussions with rmind at to reduce lock twiddling.

Also, per same discussions, add locking to sys_sa_preempt(). p_lock is
the lock we want.

Also, per same discussions, remove use of LSSUSPENDED as a thread state.
We needed to use it when we were emulating the 4.X and previous behavior
of hiding cached threads. For the moment, we now have them instead
remain visible to all and have them sleeping on the "lwpcache" wait
channel.

sa_newcachelwp(): sa_putcachelwp() wants savp_mutex held, not p_lock.

Tweak some comments.
 1.146.2.5  29-Jun-2008  wrstuden Don't grab sa_mutex in timerupcall(). we don't need it, and sa_upcall()
needs it to not be locked.

Don't call wakeup(). Just don't. lwp_unsleep() is the thing to call.
 1.146.2.4  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.146.2.3  27-May-2008  wrstuden First cut at porting SA timer code to -current. Slightly modified
so that we now use the timer_intr soft interrupt to do what we used
to do in itimerfire(). Also, use sa_mutex and p_lock instead of
the kernel lock.

Open question about what locks we can lock while holding
the timer lock.
 1.146.2.2  14-May-2008  wrstuden Per discussion with ad, remove most of the #include <sys/sa.h> lines
as they were including sa.h just for the type(s) needed for syscallargs.h.

Instead, create a new file, sys/satypes.h, which contains just the
types needed for syscallargs.h. Yes, there's only one now, but that
may change and it's probably more likely to change if it'd be difficult
to handle. :-)

Per discussion with matt at n dot o, add an include of satypes.h to
sigtypes.h. Upcall handlers are kinda signal handlers, and signalling
is the header file that's already included for syscallargs.h that
closest matches SA.

This shaves about 3000 lines off of the diff of the branch relative
to the base. That also represents about 18% of the total before this
checkin.

I think this reduction is very good thing.
 1.146.2.1  10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.148.4.1  19-Oct-2008  haad Sync with HEAD.
 1.148.2.1  18-Jul-2008  simonb Sync with head.
 1.155.4.3  10-Dec-2009  snj Pull up following revision(s) (requested by drochner in ticket #1189):
sys/kern/kern_time.c: revision 1.163
If a struct sigevent with SIGEV_SIGNAL is passed to timer_create(2),
check the signal number to be in the allowed range. An invalid
signal number could crash the kernel by overflowing the sigset_t
array.
More checks would be good, and SIGEV_THREAD shouldn't be dropped
silently, but this fixes at least the local DOS vulnerability.
 1.155.4.2  08-Feb-2009  snj branches: 1.155.4.2.2; 1.155.4.2.4;
Pull up following revision(s) (requested by rmind in ticket #429):
sys/kern/kern_time.c: revision 1.159 via patch
settime1: fix a bug i introduced when i made l_stime use monotonic time.
from Matthias Drochner on tech-kern@. PR/40511 from Martin Husemann.
 1.155.4.1  02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #401):
sys/kern/kern_time.c: revision 1.158
timer_intr: hold proc_lock across the loop, otherwise the process we are
about to signal could disappear.
 1.155.4.2.4.1  21-Apr-2010  matt sync to netbsd-5
 1.155.4.2.2.1  10-Dec-2009  snj Pull up following revision(s) (requested by drochner in ticket #1189):
sys/kern/kern_time.c: revision 1.163
If a struct sigevent with SIGEV_SIGNAL is passed to timer_create(2),
check the signal number to be in the allowed range. An invalid
signal number could crash the kernel by overflowing the sigset_t
array.
More checks would be good, and SIGEV_THREAD shouldn't be dropped
silently, but this fixes at least the local DOS vulnerability.
 1.155.2.3  28-Apr-2009  skrll Sync with HEAD.
 1.155.2.2  03-Mar-2009  skrll Sync with HEAD.
 1.155.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.159.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.163.4.3  21-Apr-2011  rmind sync with head
 1.163.4.2  05-Mar-2011  rmind sync with head
 1.163.4.1  30-May-2010  rmind sync with head
 1.163.2.1  30-Apr-2010  uebayasi Sync with HEAD.
 1.166.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.170.6.3  05-Apr-2012  mrg sync to latest -current.
 1.170.6.2  24-Feb-2012  mrg sync to -current.
 1.170.6.1  18-Feb-2012  mrg merge to -current.
 1.170.2.3  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.170.2.2  30-Oct-2012  yamt sync with head
 1.170.2.1  17-Apr-2012  yamt sync with head
 1.174.2.3  03-Dec-2017  jdolecek update from HEAD
 1.174.2.2  23-Jun-2013  tls resync from head
 1.174.2.1  20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.179.12.6  24-Feb-2019  martin Pull up following revision(s) (requested by mlelstv in ticket #1196):

sys/kern/kern_time.c: revision 1.196

The callout is used by any nonvirtual timer including CLOCK_MONOTONIC
and needs to be initialized.

Detected by [syzkaller].
 1.179.12.5  01-Feb-2019  martin Pull up following revision(s) (requested by maxv in ticket #1678):

sys/kern/kern_time.c: revision 1.190
sys/kern/kern_time.c: revision 1.194

Fix stack info leak. There are 4 bytes of padding in struct timeval. Looks
like there are other leaks related to timeval in this file.

[ 133.414352] kleak: Possible leak in copyout: [len=16, leaked=4]
[ 133.414352] #0 0xffffffff80224d0a in kleak_note <netbsd>
[ 133.424360] #1 0xffffffff80224d8a in kleak_copyout <netbsd>
[ 133.434361] #2 0xffffffff80b5fd79 in sys___gettimeofday50 <netbsd>
[ 133.434361] #3 0xffffffff8025a89c in sy_call <netbsd>
[ 133.444351] #4 0xffffffff8025aace in sy_invoke <netbsd>
[ 133.454365] #5 0xffffffff8025ab54 in syscall <netbsd>

-

Fix kernel info leaks.
 1.179.12.4  27-Dec-2018  martin Pull up following revision(s) (requested by maxv in ticket #1667):

sys/kern/kern_time.c: revision 1.191

Fix kernel info leak. There are 2x4 bytes of padding in struct itimerval.

[ 738.451860] kleak: Possible leak in copyout: [len=32, leaked=8]
[ 738.481840] #0 0xffffffff80b7c42a in kleak_note <netbsd>
[ 738.491821] #1 0xffffffff80b7c4aa in kleak_copyout <netbsd>
[ 738.501806] #2 0xffffffff80b6154e in sys___getitimer50 <netbsd>
[ 738.511778] #3 0xffffffff80b61e39 in sys___setitimer50 <netbsd>
[ 738.521781] #4 0xffffffff8025ab3c in sy_call <netbsd>
[ 738.521781] #5 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 738.531808] #6 0xffffffff8025adf4 in syscall <netbsd>
 1.179.12.3  14-Dec-2018  martin Additionally pull up following revision(s) (requested by maxv in ticket #1660):

sys/compat/linux/common/linux_misc_notalpha.c: revision 1.110
sys/kern/kern_time.c: revision 1.193

Improve my kern_time.c::rev1.192, systematically clear the buffers we get
from 'ptimer_pool' to prevent more leaks.
 1.179.12.2  29-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1660):

sys/kern/kern_time.c: revision 1.192

Fix kernel info leak.

+ Possible info leak: [len=32, leaked=16]
| #0 0xffffffff80baf3a7 in kleak_copyout
| #1 0xffffffff80b940f8 in sys___timer_settime50
| #2 0xffffffff80259c42 in syscall
 1.179.12.1  03-Mar-2016  martin Pull up following revision(s) (requested by uwe in ticket #1128):
sys/kern/kern_time.c: revision 1.184
Don't leak garabage from the kernel stack on sleep(0) and equivalents.
Hat tip to perl's ext/POSIX/t/wrappers.t
 1.179.10.6  05-Dec-2016  skrll Sync with HEAD
 1.179.10.5  09-Jul-2016  skrll Sync with HEAD
 1.179.10.4  29-May-2016  skrll Sync with HEAD
 1.179.10.3  19-Mar-2016  skrll Sync with HEAD
 1.179.10.2  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.179.10.1  22-Sep-2015  skrll Sync with HEAD
 1.179.8.6  24-Feb-2019  martin Pull up following revision(s) (requested by mlelstv in ticket #1196):

sys/kern/kern_time.c: revision 1.196

The callout is used by any nonvirtual timer including CLOCK_MONOTONIC
and needs to be initialized.

Detected by [syzkaller].
 1.179.8.5  01-Feb-2019  martin Pull up following revision(s) (requested by maxv in ticket #1678):

sys/kern/kern_time.c: revision 1.190
sys/kern/kern_time.c: revision 1.194

Fix stack info leak. There are 4 bytes of padding in struct timeval. Looks
like there are other leaks related to timeval in this file.

[ 133.414352] kleak: Possible leak in copyout: [len=16, leaked=4]
[ 133.414352] #0 0xffffffff80224d0a in kleak_note <netbsd>
[ 133.424360] #1 0xffffffff80224d8a in kleak_copyout <netbsd>
[ 133.434361] #2 0xffffffff80b5fd79 in sys___gettimeofday50 <netbsd>
[ 133.434361] #3 0xffffffff8025a89c in sy_call <netbsd>
[ 133.444351] #4 0xffffffff8025aace in sy_invoke <netbsd>
[ 133.454365] #5 0xffffffff8025ab54 in syscall <netbsd>

-

Fix kernel info leaks.
 1.179.8.4  27-Dec-2018  martin Pull up following revision(s) (requested by maxv in ticket #1667):

sys/kern/kern_time.c: revision 1.191

Fix kernel info leak. There are 2x4 bytes of padding in struct itimerval.

[ 738.451860] kleak: Possible leak in copyout: [len=32, leaked=8]
[ 738.481840] #0 0xffffffff80b7c42a in kleak_note <netbsd>
[ 738.491821] #1 0xffffffff80b7c4aa in kleak_copyout <netbsd>
[ 738.501806] #2 0xffffffff80b6154e in sys___getitimer50 <netbsd>
[ 738.511778] #3 0xffffffff80b61e39 in sys___setitimer50 <netbsd>
[ 738.521781] #4 0xffffffff8025ab3c in sy_call <netbsd>
[ 738.521781] #5 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 738.531808] #6 0xffffffff8025adf4 in syscall <netbsd>
 1.179.8.3  14-Dec-2018  martin Additionally pull up following revision(s) (requested by maxv in ticket #1660):

sys/compat/linux/common/linux_misc_notalpha.c: revision 1.110
sys/kern/kern_time.c: revision 1.193

Improve my kern_time.c::rev1.192, systematically clear the buffers we get
from 'ptimer_pool' to prevent more leaks.
 1.179.8.2  29-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1660):

sys/kern/kern_time.c: revision 1.192

Fix kernel info leak.

+ Possible info leak: [len=32, leaked=16]
| #0 0xffffffff80baf3a7 in kleak_copyout
| #1 0xffffffff80b940f8 in sys___timer_settime50
| #2 0xffffffff80259c42 in syscall
 1.179.8.1  03-Mar-2016  martin branches: 1.179.8.1.4;
Pull up following revision(s) (requested by uwe in ticket #1128):
sys/kern/kern_time.c: revision 1.184
Don't leak garabage from the kernel stack on sleep(0) and equivalents.
Hat tip to perl's ext/POSIX/t/wrappers.t
 1.179.8.1.4.5  24-Feb-2019  martin Pull up following revision(s) (requested by mlelstv in ticket #1196):

sys/kern/kern_time.c: revision 1.196

The callout is used by any nonvirtual timer including CLOCK_MONOTONIC
and needs to be initialized.

Detected by [syzkaller].
 1.179.8.1.4.4  01-Feb-2019  martin Pull up following revision(s) (requested by maxv in ticket #1678):

sys/kern/kern_time.c: revision 1.190
sys/kern/kern_time.c: revision 1.194

Fix stack info leak. There are 4 bytes of padding in struct timeval. Looks
like there are other leaks related to timeval in this file.

[ 133.414352] kleak: Possible leak in copyout: [len=16, leaked=4]
[ 133.414352] #0 0xffffffff80224d0a in kleak_note <netbsd>
[ 133.424360] #1 0xffffffff80224d8a in kleak_copyout <netbsd>
[ 133.434361] #2 0xffffffff80b5fd79 in sys___gettimeofday50 <netbsd>
[ 133.434361] #3 0xffffffff8025a89c in sy_call <netbsd>
[ 133.444351] #4 0xffffffff8025aace in sy_invoke <netbsd>
[ 133.454365] #5 0xffffffff8025ab54 in syscall <netbsd>

-

Fix kernel info leaks.
 1.179.8.1.4.3  27-Dec-2018  martin Pull up following revision(s) (requested by maxv in ticket #1667):

sys/kern/kern_time.c: revision 1.191

Fix kernel info leak. There are 2x4 bytes of padding in struct itimerval.

[ 738.451860] kleak: Possible leak in copyout: [len=32, leaked=8]
[ 738.481840] #0 0xffffffff80b7c42a in kleak_note <netbsd>
[ 738.491821] #1 0xffffffff80b7c4aa in kleak_copyout <netbsd>
[ 738.501806] #2 0xffffffff80b6154e in sys___getitimer50 <netbsd>
[ 738.511778] #3 0xffffffff80b61e39 in sys___setitimer50 <netbsd>
[ 738.521781] #4 0xffffffff8025ab3c in sy_call <netbsd>
[ 738.521781] #5 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 738.531808] #6 0xffffffff8025adf4 in syscall <netbsd>
 1.179.8.1.4.2  14-Dec-2018  martin Additionally pull up following revision(s) (requested by maxv in ticket #1660):

sys/compat/linux/common/linux_misc_notalpha.c: revision 1.110
sys/kern/kern_time.c: revision 1.193

Improve my kern_time.c::rev1.192, systematically clear the buffers we get
from 'ptimer_pool' to prevent more leaks.
 1.179.8.1.4.1  29-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1660):

sys/kern/kern_time.c: revision 1.192

Fix kernel info leak.

+ Possible info leak: [len=32, leaked=16]
| #0 0xffffffff80baf3a7 in kleak_copyout
| #1 0xffffffff80b940f8 in sys___timer_settime50
| #2 0xffffffff80259c42 in syscall
 1.188.2.1  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.189.16.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.189.16.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.189.16.1  10-Jun-2019  christos Sync with HEAD
 1.189.14.2  26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.189.14.1  26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.189.8.6  25-May-2020  martin Pull up following revision(s) (requested by christos in ticket #1549):

sys/netinet/igmp.c: revision 1.70
sys/kern/kern_time.c: revision 1.204

igmp_sendpkt() expects ip_output() to set 'imo.imo_multicast_ttl' into
'ip->ip_ttl'; but ip_output() won't if the target is not a multicast
address, meaning that the uninitialized 'ip->ip_ttl' byte gets sent to
the network. This leaks one byte of kernel heap.

Fix this by filling 'ip->ip_ttl' with a TTL of one.
Found by KMSAN.

-

Fix uninitialized memory access. Found by KMSAN.
 1.189.8.5  24-Feb-2019  martin Pull up following revision(s) (requested by mlelstv in ticket #1196):

sys/kern/kern_time.c: revision 1.196

The callout is used by any nonvirtual timer including CLOCK_MONOTONIC
and needs to be initialized.

Detected by [syzkaller].
 1.189.8.4  01-Feb-2019  martin Pull up following revision(s) (requested by maxv in ticket #1180):

sys/kern/kern_time.c: revision 1.190
sys/kern/kern_time.c: revision 1.194

Fix stack info leak. There are 4 bytes of padding in struct timeval. Looks
like there are other leaks related to timeval in this file.

[ 133.414352] kleak: Possible leak in copyout: [len=16, leaked=4]
[ 133.414352] #0 0xffffffff80224d0a in kleak_note <netbsd>
[ 133.424360] #1 0xffffffff80224d8a in kleak_copyout <netbsd>
[ 133.434361] #2 0xffffffff80b5fd79 in sys___gettimeofday50 <netbsd>
[ 133.434361] #3 0xffffffff8025a89c in sy_call <netbsd>
[ 133.444351] #4 0xffffffff8025aace in sy_invoke <netbsd>
[ 133.454365] #5 0xffffffff8025ab54 in syscall <netbsd>

-

Fix kernel info leaks.
 1.189.8.3  27-Dec-2018  martin Pull up following revision(s) (requested by maxv in ticket #1147):

sys/kern/kern_time.c: revision 1.191

Fix kernel info leak. There are 2x4 bytes of padding in struct itimerval.

[ 738.451860] kleak: Possible leak in copyout: [len=32, leaked=8]
[ 738.481840] #0 0xffffffff80b7c42a in kleak_note <netbsd>
[ 738.491821] #1 0xffffffff80b7c4aa in kleak_copyout <netbsd>
[ 738.501806] #2 0xffffffff80b6154e in sys___getitimer50 <netbsd>
[ 738.511778] #3 0xffffffff80b61e39 in sys___setitimer50 <netbsd>
[ 738.521781] #4 0xffffffff8025ab3c in sy_call <netbsd>
[ 738.521781] #5 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 738.531808] #6 0xffffffff8025adf4 in syscall <netbsd>
 1.189.8.2  30-Nov-2018  martin Additionally pull up following revision(s) (requested by maxv in ticket #1110):

sys/compat/linux/common/linux_misc_notalpha.c: revision 1.110
sys/kern/kern_time.c: revision 1.193

Improve my kern_time.c::rev1.192, systematically clear the buffers we get
from 'ptimer_pool' to prevent more leaks.
 1.189.8.1  29-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1110):

sys/kern/kern_time.c: revision 1.192

Fix kernel info leak.

+ Possible info leak: [len=32, leaked=16]
| #0 0xffffffff80baf3a7 in kleak_copyout
| #1 0xffffffff80b940f8 in sys___timer_settime50
| #2 0xffffffff80259c42 in syscall
 1.197.4.4  01-Nov-2020  martin Pull up following revision(s) (requested by nia in ticket #1124):

sys/kern/kern_time.c: revision 1.206

kern_time: prevent the system clock from being set too low or high
currently doing this will drive KUBSAN haywire and possibly cause
system lock-ups, so more testing should probably be performed before
we let the clock be set too many thousands of years into the future.

ditto for negative values, which were being passed by chrony for
some reason while my internet connection was being unreliable.
this also triggered some interesting KUBSAN reports.
 1.197.4.3  18-May-2020  martin Pull up following revision(s) (requested by maxv in ticket #916):

sys/kern/kern_time.c: revision 1.204

Fix uninitialized memory access. Found by KMSAN.
 1.197.4.2  11-Sep-2019  martin Additionally pull up the following revision for ticket #192, to fix the build:

src/sys/kern/kern_time.c 1.199

mark a variable __diagused to fix this problem affecting many builds:

kern/kern_time.c:1413:6: error: variable 'error' set but not used
[-Werror=unused-but-set-variable]
 1.197.4.1  10-Sep-2019  martin Pull up following revision(s) (requested by maxv in ticket #192):

sys/sys/timevar.h: revision 1.39
sys/kern/kern_time.c: revision 1.198

Fix race in timer destruction.

Anything we confirmed about the world before callout_halt may cease
to be true afterward, so make sure to start over in that case.

Add some comments explaining what's going on.
 1.206.2.2  03-Apr-2021  thorpej Sync with HEAD.
 1.206.2.1  14-Dec-2020  thorpej Sync w/ HEAD.
 1.210.2.1  03-Apr-2021  thorpej Sync with HEAD.
 1.218.2.1  22-Feb-2023  martin Pull up following revision(s) (requested by thorpej in ticket #101):

sys/kern/kern_time.c: revision 1.219

In itimer_arm_real(), KASSERT that it->it_dying is false. This was
already implicitly assumed, but make it explicit in hopes of tracking
down kern/57226.

RSS XML Feed