Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/kern_event.c
RevisionDateAuthorComments
 1.150  21-Sep-2023  msaitoh s/ for for / for / in comment.
 1.149  28-Jul-2023  christos Add epoll(2) from Theodore Preduta as part of GSoC 2023
 1.148  22-Apr-2023  riastradh file(9): New fo_fpathconf operation.

XXX kernel revbump -- struct fileops API and ABI change
 1.147  09-Apr-2023  riastradh kern: KASSERT(A && B) -> KASSERT(A); KASSERT(B)
 1.146  24-Jul-2022  riastradh kern_event.c: Mark KASSERT-only static function as __diagused.

Otherwise clang objects with -Wunneeded-internal-declaration.
 1.145  19-Jul-2022  thorpej Fix a problem whereby detaching a device that has open kevent
registrations can result in a UAF: When a device detaches, it
calls seldestroy(), which calls knote_fini(), and when that
returns, the softc that contained the selinfo and klist are freed.
However, any knotes that were registered still linger on with the
kq descriptor they're were associated with, and when the file
descriptors close, those knotes will be f_detach'd, which will
call into the driver instance that no longer exists.

Address this problem by adding a "foplock" mutex to the knote.
This foplock must be held when calling into filter_attach(),
filter_detach(), and filter_event() (XXX not filter_touch();
see code for details). Now, in klist_fini(), for each knote
that is on the klist that's about to be blown away, acquire
the foplock, replace the knote's filterops with a do-nothing
stub, and release the foplock.

The end result is that:
==> The foplock ensures that calls into filter_*() will get EITHER
the real backing object's filterops OR the nop stubs.
==> Holing the foplock across the filter_*() calls ensures that
klist_fini() will not complete until there are no callers inside
the filterops that are about to be blown away.
 1.144  19-Jul-2022  thorpej Make some knote implementation details private to kern_event.c. NFC, and
no ABI change for kevent providers.
 1.143  13-Jul-2022  thorpej Move klist_{init,fini,insert,remove}() into kern_event.c. NFC.
 1.142  13-Jul-2022  thorpej Funnel knote alloc/free into a single pair of functions. NFCI.
 1.141  24-May-2022  andvar fix various typos in comment, documentation and log messages.
 1.140  12-Feb-2022  thorpej Add inline functions to manipulate the klists that link up knotes
via kn_selnext:

- klist_init()
- klist_fini()
- klist_insert()
- klist_remove()

These provide some API insulation from the implementation details of these
lists (but not completely; see vn_knote_attach() and vn_knote_detach()).
Currently just a wrapper around SLIST(9).

This will make it significantly easier to switch kn_selnext linkage
to a different kind of list.
 1.139  01-Jan-2022  msaitoh s/aquire/acquire/ in comment.
 1.138  23-Oct-2021  thorpej Fix a regression introduced in kern_event.c,v 1.129 that would cause
"udata" to get clobbered on ONESHOT events, and add a unit test for it.
Reported by martin@ (manifested in his case as a KASSERT() firing when
running unit tests in COMPAT_NETBSD32).
 1.137  23-Oct-2021  thorpej Add support for the EVFILT_EMPTY filter, which is activated when the
write buffer associated with the file descriptor is empty. This is
currently implemented only for sockets, and is intended primarily to
provide visibility to applications that all previously written data
has been acknowledged by the TCP layer on the receiver. Compatible
with the same filter in FreeBSD.
 1.136  22-Oct-2021  thorpej Support modifying an existing timer without having to delete it first.
Semantics match FreeBSD.
 1.135  21-Oct-2021  thorpej Allow the f_touch() filter op to return an error, specifically in
the EVENT_REGISTER case.
 1.134  21-Oct-2021  thorpej Re-factor the code that computes the EVFILT_TIMER value into its own
function.

NFC.
 1.133  21-Oct-2021  thorpej - Don't use a separate kqueue_timer_lock; just protect those knotes
with the kq->kq_lock.
- Re-factor the guts of knote_activate() into knote_activate_locked(),
and use it in a few places to avoid a few unlock-the-immediately-lock
cycles.
- Define a FILT_TIMER_NOSCHED macro, rather than hard-coding (uintptr_t)-1
in a bunch of difference place.

NFC.
 1.132  13-Oct-2021  thorpej Add support for the NOTE_SECONDS, NOTE_MSECONDS, NOTE_USECONDS,
NOTE_NSECONDS, and NOTE_ABSTIME filter flags to EVFILT_TIMER,
API-compatible with the same in FreeBSD.
 1.131  11-Oct-2021  thorpej Setting EV_EOF requires modifying kn->kn_flags. However, that relies on
holding the kq_lock of that note's kq. Rather than exposing this directly,
add new knote_set_eof() and knote_clear_eof() functions that handle the
necessary locking and don't leak as many implementation details to modules.

NetBSD 9.99.91
 1.130  10-Oct-2021  thorpej Check _KERNEL_OPT before including opt_ddb.h.
 1.129  10-Oct-2021  thorpej Changes to make EVFILT_PROC MP-safe:

Because the locking protocol around processes is somewhat complex
compared to other events that can be posted on kqueues, introduce
new functions for posting NOTE_EXEC, NOTE_EXIT, and NOTE_FORK,
rather than just using the generic knote() function. These functions
KASSERT() their locking expectations, and deal with other complexities
for each situation.

knote_proc_fork(), in particiular, needs to handle NOTE_TRACK, which
requires allocation of a new knote to attach to the child process. We
don't want to be allocating memory while holding the parent's p_lock.
Furthermore, we also have to attach the tracking note to the child
process, which means we have to acquire the child's p_lock.

So, to handle all this, we introduce some additional synchronization
infrastructure around the 'knote' structure:

- Add the ability to mark a knote as being in a state of flux. Knotes
in this state are guaranteed not to be detached/deleted, thus allowing
a code path drop other locks after putting a knote in this state.

- Code paths that wish to detach/delete a knote must first check if the
knote is in-flux. If so, they must wait for it to quiesce. Because
multiple threads of execution may attempt this concurrently, a mechanism
exists for a single LWP to claim the detach responsibility; all other
threads simply wait for the knote to disappear before they can make
further progress.

- When kqueue_scan() encounters an in-flux knote, it simply treats the
situation just like encountering another thread's queue marker -- wait
for the flux to settle and continue on.

(The "in-flux knote" idea was inspired by FreeBSD, but this works differently
from their implementation, as the two kqueue implementations have diverged
quite a bit.)

knote_proc_fork() uses this infrastructure to implement NOTE_TRACK like so:

- Attempt to put the original tracking knote into a state of flux; if this
fails (because the note has a detach pending), we skip all processing
(the original process has lost interest, and we simply won the race).

- Once the note is in-flux, drop the kq and forking process's locks, and
allocate 2 knotes: one to post the NOTE_CHILD event, and one to attach
a new NOTE_TRACK to the child process. Notably, we do NOT go through
kqueue_register() to do this, but rather do all of the work directly
and KASSERT() our assumptions; this allows us to directly control our
interaction with locks. All memory allocations here are performed with
KM_NOSLEEP, in order to prevent holding the original knote in-flux
indefinitely.

- Because the NOTE_TRACK use case adds knotes to kqueues through a
sort of back-door mechanism, we must serialize with the closing of
the destination kqueue's file descriptor, so steal another bit from
the kq_count field to notify other threads that a kqueue is on its
way out to prevent new knotes from being enqueued while the close
path detaches them.

In addition to fixing EVFILT_PROC's reliance on KERNEL_LOCK, this also
fixes a long-standing bug whereby a NOTE_CHILD event could be dropped
if the child process exited before the interested process received the
NOTE_CHILD event (the same knote would be used to deliver the NOTE_EXIT
event, and would clobber the NOTE_CHILD's 'data' field).

Add a bunch of comments to explain what's going on in various critical
sections, and sprinkle additional KASSERT()s to validate assumptions
in several more locations.
 1.128  30-Sep-2021  thorpej Make the info returned by kqueue_stat() a little less barren.
 1.127  30-Sep-2021  thorpej In knote(), don't call kn->kn_fop->f_event() directly; use filter_event()
to get the correct KERNEL_LOCK handling for the filter attached to that
specific note.
 1.126  26-Sep-2021  thorpej In kqueue_kqfilter(), return EINVAL instead of 1 if something other than
EVFILT_READ was requested.
 1.125  26-Sep-2021  thorpej - Rename kqueue_misc_lock -> kqueue_timer_lock, since EVFILT_TIMER is
now its only user. Also initialize it as IPL_SOFTCLOCK; there is no
practical difference in how it operates (it is still an adaptive lock),
but this serves as a visual reminder that we are interlocking against
a callout.
- Add some comments that describe why we don't need to hold kqueue_timer_lock
when detaching an EVFILT_TIMER due to guarantees made by callout_halt().
- Mark timer_filtops as MPSAFE.
 1.124  26-Sep-2021  thorpej Fix the locking around EVFILT_FS. Previously, the code would walk the
fs_klist and take the kqueue_misc_lock inside the event callback.
However, that list can be modified by the attach and detach callbacks,
which could result in the walker stepping right off a cliff.

Instead, we give the fs_klist it's own lock, and hold it while we
call knote(), using the NOTE_SUBMIT protocol. Also, fs_filtops
into vfs_syscalls.c so all of the locking logic is contained in one
file (there is precedence with sig_filtops). fs_filtops is now marked
MPSAFE.
 1.123  26-Sep-2021  thorpej Mark kqread_filtops, user_filtops, and seltrue_filtops as MPSAFE.
 1.122  26-Sep-2021  thorpej - Define a new filterops flag FILTEROP_MPSAFE, which states that the
kqueue filter does not require the KERNEL_LOCK to be held.
- Add wrappers around the calls into the filterops that take care of
the locking requirements.

No functional change, since no filterops yet define FILTEROP_MPSAFE.
 1.121  26-Sep-2021  thorpej Change the kqueue filterops::f_isfd field to filterops::f_flags, and
define a flag FILTEROP_ISFD that has the meaning of the prior f_isfd.
Field and flag name aligned with OpenBSD.

This does not constitute a functional or ABI change, as the field location
and size, and the value placed in that field, are the same as the previous
code, but we're bumping __NetBSD_Version__ so 3rd-party module source code
can adapt, as needed.

NetBSD 9.99.89
 1.120  21-Sep-2021  christos undo previous, wrong file.
 1.119  21-Sep-2021  christos don't opencode kauth_cred_get()
 1.118  02-May-2021  jdolecek implement fo_restart hook for kqueue descriptors, so that close(2)
on the descriptor won't block indefinitely if other thread is currently
blocked on the same kqueue in kevent(2)

done similarily to pipes and sockets, i.e. using flag on the potentially
shared kqueue structure hooked off file_t - this is somewhat suboptimal
if the application dup(2)ped the descriptor, but this should be rare
enough to not really matter

usually this causes the kevent(2) to end up returning EBADF since
on the syscall restart the descriptor is not there anymore; if
dup(2)ped the kevent(2) call can continue successfully if the closed
kqueue descriptor was other than the one used for the kevent(2)
call

PR kern/46248 by Julian Fagir
 1.117  27-Jan-2021  skrll branches: 1.117.4;
Fix non-DIAGNOSTIC build
 1.116  26-Jan-2021  jdolecek call f_touch with kq_lock held, and without KERNEL_LOCK() - for this
adjust EVFILT_USER, which is the only filter actually using that hook

kqueue_scan() now doesn't need to exit/enter the kq_lock when calling
f_touch, which removes another possible race

part of PR kern/50094
 1.115  25-Jan-2021  jdolecek put back clearing of KN_QUEUED and check for re-queue - as rev. 1.53 notes,
it's necessary for correct function

fixes PR kern/55946, thanks to Paul Goyette for testing

part of PR kern/50094 fix
 1.114  24-Jan-2021  jdolecek don't check signals while waiting for other kqueue scans to finish

reportedly somewhat improves behaviour for PR kern/55946

part of PR kern/50094 fix
 1.113  21-Jan-2021  jdolecek remove stray debug #define DEBUG
 1.112  21-Jan-2021  jdolecek adjust kq_check() (enabled with DEBUG) to new reality - it's now perfectly
normal to have kq_count bigger than number of the linked entries
on the kqueue

PR kern/50094, problem pointed out by Chuck Silvers
 1.111  20-Jan-2021  jdolecek fix a race in kqueue_scan() - when multiple threads check the same
kqueue, it could happen other thread seen empty kqueue while kevent
was being checked for re-firing and re-queued

make sure to keep retrying if there are outstanding kevents even
if no kevent is found on first pass through the queue, and only
drop the KN_QUEUED flag and kq_count when actually completely done
with the kevent

change is inspired by the FreeBSD in-flux handling, but without
introducing the reference counting

PR kern/50094 by Christof Meerwald
 1.110  27-Dec-2020  jdolecek reduce indentation for the main processing loop in kqueue_scan(), this also
makes the code more similar to FreeBSD; NFCI

part of PR kern/50094
 1.109  11-Dec-2020  thorpej Use sel{record,remove}_knote().
 1.108  31-Oct-2020  christos branches: 1.108.2;
PR/55663: Ruslan Nikolaev: Add support for EVFILT_USER in kqueue(2)
 1.107  23-May-2020  ad Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.106  01-Feb-2020  riastradh Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:

- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.
 1.105  18-Oct-2019  christos branches: 1.105.2;
print which process asked for an unsupported event so we can fix it.
 1.104  13-Nov-2018  maxv branches: 1.104.4;
Fix kernel info leak. There are 4 bytes of padding in struct kevent.

[ 287.537676] kleak: Possible leak in copyout: [len=40, leaked=4]
[ 287.537676] #0 0xffffffff80b7c41a in kleak_note <netbsd>
[ 287.547673] #1 0xffffffff80b7c49a in kleak_copyout <netbsd>
[ 287.557677] #2 0xffffffff80b1d32d in kqueue_scan.isra.1.constprop.2 <netbsd>
[ 287.557677] #3 0xffffffff80b1dc6a in kevent1 <netbsd>
[ 287.567683] #4 0xffffffff80b1dcb0 in sys___kevent50 <netbsd>
[ 287.567683] #5 0xffffffff8025ab3c in sy_call <netbsd>
[ 287.577688] #6 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 287.587693] #7 0xffffffff8025adf4 in syscall <netbsd>
 1.103  12-Jan-2018  christos branches: 1.103.2; 1.103.4;
Set EV_ONESHOT to prevent rescheduling
XXX: pullup-8
 1.102  09-Jan-2018  christos Merge autofs support from: Tomohiro Kusumi
XXX: Does not work yet
 1.101  30-Nov-2017  christos add fo_name so we can identify the fileops in a simple way.
 1.100  30-Nov-2017  christos Put previous removed diagnostic back as debug. It has caught in the past
(and now) different kqueue behavior between NetBSD and other kqueue
implementations that depend on specific file types. If 3rd party programs
trigger this it is probably because we are doing something different.
 1.99  30-Nov-2017  riastradh Remove spammy kevent failure printf.

Maybe this was once useful for debugging the kernel, but it's just
console spam triggered by buggy or malicious userland programs now.
 1.98  11-Nov-2017  christos Don't add kevents to closing file descriptors (from riastradh)
 1.97  07-Nov-2017  christos Add two utility functions to help use kmem with strings: kmem_strdupsize,
kmem_strfree.
 1.96  25-Oct-2017  maya Use C99 initializer for filterops

Mostly done with spatch with touchups for indentation

@@
expression a;
identifier b,c,d;
identifier p;
@@
const struct filterops p =
- { a, b, c, d
+ {
+ .f_isfd = a,
+ .f_attach = b,
+ .f_detach = c,
+ .f_event = d,
};
 1.95  25-Oct-2017  riastradh Document lock order and locking rules.
 1.94  16-Sep-2017  christos more debug info
 1.93  03-Jul-2017  riastradh Nix trailing whitespace. No functional change.
 1.92  01-Jul-2017  christos fix file descriptor locking (from joerg).
fixes kernel crashes by running go
XXX: pullup-7
 1.91  11-May-2017  christos branches: 1.91.2;
protect against NULL, from PaulG
 1.90  09-May-2017  christos fp == NULL in the DIAGNOSTIC, so use the real fp and also print the errno.
 1.89  27-Apr-2017  abhinav Rearrange the if conditions in order to get rid of unnecessary indentation.

No functional change intended. ok christos@
 1.88  14-Jul-2016  christos branches: 1.88.8;
make sure we cleanup properly when fd is too big.
 1.87  14-Jul-2016  christos From tedu at openbsd:

kevent validates that ident is a valid fd by getting the file. one sad
quirk: uint64 to int32 truncation can lead to false positives, and then
later in the array sizing code, very big mallocs panic the kernel.
add a check that the ident isn't larger than INT_MAX in the fd case.
reported by Tim Newsham
 1.86  04-Apr-2016  christos Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>
 1.85  31-Jan-2016  christos PR/50730: Benny Siegert: Go kqueue test panics kernel.
- use a marker knote from the stack instead of allocating and freeing on
each scan.
- add more KASSERTS
- introduce a KN_BUSY bit that indicates that the knote is currently being
scanned, so that knote_detach does not end up deleting it when the file
descriptor gets closed and we don't end up using/trashing free memory from
the scan.
 1.84  08-Dec-2015  christos PR/50506: Tobias Nygren: kqueue(2) lacks EV_DISPATCH/EV_RECEIPT support
 1.83  02-Mar-2015  christos put the exit code of the process in data, like FreeBSD does.
 1.82  05-Sep-2014  matt branches: 1.82.2;
Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.
 1.81  05-Sep-2014  matt Don't next structure and enum definitions.
Don't use C++ keywords new, try, class, private, etc.
 1.80  24-Jun-2014  maxv branches: 1.80.2;
Do not hardcode the value. Use KQ_NEVENTS.
 1.79  24-Nov-2012  christos branches: 1.79.10;
- initialize kn_id
- in close, invalidate f_data and f_type early to prevent accidental re-use
- add a DIAGNOSTIC for when we use unsupported fd's and a KASSERT for f_event
being NULL.
 1.78  18-Nov-2012  pooka remove unused variable
 1.77  17-Nov-2012  joerg Unbreak the NOTE_TRACK event of EVFILT_PROC. When attaching to the child
process, proc_find can't be used as the child is still in state SIDL.
 1.76  02-Jun-2012  martin branches: 1.76.2;
Remove an unused variable
 1.75  25-Jan-2012  christos branches: 1.75.2; 1.75.6;
As discussed in tech-kern, provide the means to prevent delivery of SIGPIPE
on EPIPE for all file descriptor types:

- provide O_NOSIGPIPE for open,kqueue1,pipe2,dup3,fcntl(F_{G,S}ETFL) [NetBSD]
- provide SOCK_NOSIGPIPE for socket,socketpair [NetBSD]
- provide SO_NOSIGPIPE for {g,s}seckopt [NetBSD/FreeBSD/MacOSX]
- provide F_{G,S}ETNOSIGPIPE for fcntl [MacOSX]
 1.74  17-Nov-2011  rmind branches: 1.74.4;
kqueue_register: avoid calling fd_getfile() with filedesc_t::fd_lock held.

Fixes PR/45479 by KOGULE Ryo.
 1.73  17-Nov-2011  christos PR/45618: Motoyuki OHMORI: kqueue EVFILT_TIMER with smaller timeout value
makes DIAGNOSTIC kernel panic:
KASSERT((c->c_flags & CALLOUT_PENDING) != 0);
If the computed ticks are <= 0 set it to 1
 1.72  26-Jun-2011  christos branches: 1.72.2;
* Arrange for interfaces that create new file descriptors to be able to
set close-on-exec on creation (http://udrepper.livejournal.com/20407.html).

- Add F_DUPFD_CLOEXEC to fcntl(2).
- Add MSG_CMSG_CLOEXEC to recvmsg(2) for unix file descriptor passing.
- Add dup3(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add pipe2(2) syscall with a flags argument for O_CLOEXEC, O_NONBLOCK.
- Add flags SOCK_CLOEXEC, SOCK_NONBLOCK to the socket type parameter
for socket(2) and socketpair(2).
- Add new paccept(2) syscall that takes an additional sigset_t to alter
the sigmask temporarily and a flags argument to set SOCK_CLOEXEC,
SOCK_NONBLOCK.
- Add new mode character 'e' to fopen(3) and popen(3) to open pipes
and file descriptors for close on exec.
- Add new kqueue1(2) syscall with a new flags argument to open the
kqueue file descriptor with O_CLOEXEC, O_NONBLOCK.

* Fix the system calls that take socklen_t arguments to actually do so.

* Don't include userland header files (signal.h) from system header files
(rump_syscallargs.h).

* Bump libc version for the new syscalls.
 1.71  10-Sep-2010  drochner make list traversing in knote() safe against removal of the entry
while the loop body is executed -- at least in the EVFILT_PROC / exit
case a race condition exists which can cause this
fixes a panic triggered eg by tests/kernel/kqueue/proc1
 1.70  01-Jul-2010  rmind Remove pfind() and pgfind(), fix locking in various broken uses of these.
Rename real routines to proc_find() and pgrp_find(), remove PFIND_* flags
and have consistent behaviour. Provide proc_find_raw() for special cases.
Fix memory leak in sysctl_proc_corename().

COMPAT_LINUX: rework ptrace() locking, minimise differences between
different versions per-arch.

Note: while this change adds some formal cosmetics for COMPAT_DARWIN and
COMPAT_IRIX - locking there is utterly broken (for ages).

Fixes PR/43176.
 1.69  22-Dec-2009  dsl branches: 1.69.2; 1.69.4;
Use sizeof correct type, not pointer to wrong type.
Fixes PR/42498.
This has been wrong since the initial import!
 1.68  20-Dec-2009  dsl If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567
 1.67  09-Dec-2009  dsl Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.
 1.66  03-Oct-2009  elad Move kevent policy back to the subsystem.
 1.65  24-May-2009  ad More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.
 1.64  04-Apr-2009  ad Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)
 1.63  30-Mar-2009  christos fix erroneously deleted assignment.
 1.62  29-Mar-2009  christos Move the internal poll/select related API's to use timespec instead
of timeval (rides the uvm bump).
 1.61  11-Jan-2009  christos branches: 1.61.2;
merge christos-time_t
 1.60  24-Jun-2008  gmcgarry branches: 1.60.4; 1.60.6;
Replace gcc-style designated initialisers with c99-style.
 1.59  05-May-2008  ad branches: 1.59.2; 1.59.4;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.58  28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.57  24-Apr-2008  ad branches: 1.57.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
 1.56  24-Apr-2008  ad Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.55  22-Apr-2008  ad Give callout_halt() an additional 'kmutex_t *interlock' argument. If there
is a need to block and wait for the callout to complete, and there is an
interlock, it will be dropped while waiting and reacquired before return.
 1.54  22-Apr-2008  ad Mark the callout MPSAFE and use callout_halt().
 1.53  26-Mar-2008  ad branches: 1.53.2; 1.53.4;
- kqueue_scan: work around problem noted by yamt@: if an event fires while
we have unlocked the kqueue to check its state, leave it queued and
re-check later.
- knote_dequeue: fold into knote_detach since nothing else uses it.
- Note a couple more problems.
 1.52  24-Mar-2008  yamt add some DEBUG checks.
 1.51  23-Mar-2008  yamt kqueue_scan: skip markers correctly.
 1.50  22-Mar-2008  yamt wrap a long line.
 1.49  21-Mar-2008  ad File descriptor changes, discussed on tech-kern:

- Redo reference counting to be sane. LWPs accessing files take a short
term reference on the local file descriptor. This is the most common
case. While a file is in a process descriptor table, a reference is
held to the file. The file reference count only changes during control
operations like open() or close(). Code that comes at files from an
unusual direction (i.e. foreign to the process) like procfs or sysctl
takes a reference on the file (f_count), and not on a descriptor.

- Remove knowledge of reference counting and locking from most code that
deals with files.

- Make the usual case of file descriptor lookup lockless.

- Make kqueue MP and MT safe. PR kern/38098, PR kern/38137.

- Fix numerous file handling bugs, and bugs in the descriptor code that
affected multithreaded processes.

- Split descriptor system calls out into sys_descrip.c.

- A few stylistic changes: KNF, remove unused casts now that caddr_t is
gone. Replace dumb gotos with loop control in a few places.

- Don't do redundant pointer passing (struct proc, lwp, filedesc *) unless
the routine is likely to be inlined. Most of the time it's about the
current process.
 1.48  01-Mar-2008  rmind Welcome to 4.99.55:

- Add a lot of missing selinit() and seldestroy() calls.

- Merge selwakeup() and selnotify() calls into a single selnotify().

- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.

Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
 1.47  18-Feb-2008  ad branches: 1.47.2; 1.47.6;
knote_fdclose: acquire kernel_lock because many objects that can be
polled do not have locking of their own.
 1.46  23-Jan-2008  elad Tons of process scope changes.

- Add a KAUTH_PROCESS_SCHEDULER action, to handle scheduler related
requests, and add specific requests for set/get scheduler policy and
set/get scheduler parameters.

- Add a KAUTH_PROCESS_KEVENT_FILTER action, to handle kevent(2) related
requests.

- Add a KAUTH_DEVICE_TTY_STI action to handle requests to TIOCSTI.

- Add requests for the KAUTH_PROCESS_CANSEE action, indicating what
process information is being looked at (entry itself, args, env,
open files).

- Add requests for the KAUTH_PROCESS_RLIMIT action indicating set/get.

- Add requests for the KAUTH_PROCESS_CORENAME action indicating set/get.

- Make bsd44 secmodel code handle the newly added rqeuests appropriately.

All of the above make it possible to issue finer-grained kauth(9) calls in
many places, removing some KAUTH_GENERIC_ISSUSER requests.

- Remove the "CAN" from KAUTH_PROCESS_CAN{KTRACE,PROCFS,PTRACE,SIGNAL}.

Discussed with christos@ and yamt@.
 1.45  05-Jan-2008  dsl Use FILE_LOCK() and FILE_UNLOCK()
 1.44  20-Dec-2007  dsl Convert all the system call entry points from:
int foo(struct lwp *l, void *v, register_t *retval)
to:
int foo(struct lwp *l, const struct foo_args *uap, register_t *retval)
Fixup compat code to not write into 'uap' and (in some cases) to actually
pass a correctly formatted 'uap' structure with the right name to the
next routine.
A few 'compat' routines that just call standard ones have been deleted.
All the 'compat' code compiles (along with the kernels required to test
build it).
98% done by automated scripts.
 1.43  05-Dec-2007  pooka branches: 1.43.4;
Do not "return 1" from kqfilter for errors. That value is passed
directly to the userland caller and results in a mysterious EPERM.
Instead, return EINVAL or something else sensible depending on the
case.
 1.42  03-Dec-2007  pooka branches: 1.42.2;
Some boys take a beautiful seltrue_filtops and hide her away from
the rest of the world - but let's not.
 1.41  08-Oct-2007  ad branches: 1.41.4;
Merge file descriptor locking, cwdi locking and cross-call changes
from the vmlocking branch.
 1.40  21-Jul-2007  ad branches: 1.40.4; 1.40.6; 1.40.8; 1.40.10;
+#include <sys/conf.h>
 1.39  09-Jul-2007  ad branches: 1.39.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.38  12-Mar-2007  ad branches: 1.38.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.37  04-Mar-2007  christos branches: 1.37.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.36  17-Feb-2007  pavel Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.
 1.35  09-Feb-2007  ad branches: 1.35.2;
Merge newlock2 to head.
 1.34  04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.33  01-Nov-2006  yamt branches: 1.33.2; 1.33.8;
remove some __unused from function parameters.
 1.32  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.31  30-Sep-2006  seanb - Avoid array overrun in kfilter_byname_user() when all user
kfilter slots are used: no guarantee previously that last
slot had a NULL name.
- Reuse previously deregistered user kfilter slots in
kfilter_register().
 1.30  23-Jul-2006  ad branches: 1.30.4; 1.30.6;
Use the LWP cached credentials where sane.
 1.29  14-Jul-2006  kardel reduce sleep time by slept time for retrys
 1.28  07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.27  14-May-2006  elad branches: 1.27.2;
integrate kauth.
 1.26  21-Apr-2006  yamt sprinkle some const and static.
 1.25  11-Dec-2005  christos branches: 1.25.4; 1.25.6; 1.25.8; 1.25.10; 1.25.12;
merge ktrace-lwp.
 1.24  23-Oct-2005  cube - Split sys_kevent into kevent1 so that it can be used by COMPAT_NETBSD32
code.

- To achieve COMPAT_NETBSD32 compatibility, introduce a parameter to
kevent1 that points to functions that do the actual copyin/copyout
operations. This is similar to what was done in FreeBSD by Paul Saab.

- Add the COMPAT_NETBSD32 definitions and hooks.
 1.23  29-May-2005  christos branches: 1.23.2; 1.23.4;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.
 1.22  26-Feb-2005  perry nuke trailing whitespace
 1.21  30-Nov-2004  christos branches: 1.21.4; 1.21.6;
Cloning cleanup:
1. make fileops const
2. add 2 new negative errno's to `officially' support the cloning hack:
- EDUPFD (used to overload ENODEV)
- EMOVEFD (used to overload ENXIO)
3. Created an fdclone() function to encapsulate the operations needed for
EMOVEFD, and made all cloners use it.
4. Centralize the local noop/badop fileops functions to:
fnullop_fcntl, fnullop_poll, fnullop_kqfilter, fbadop_stat
 1.20  25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.19  14-Feb-2004  jdolecek allocate wired memory for the marker kevent in kqueue_scan() instead
of using on-stack memory, so that this wouldn't eventually cause kernel
panic if the process get swapped out and another process runs kqueue_scan()
problem pointed out in kern/24220 by Stephan Uphoff
 1.18  11-Jan-2004  jdolecek fix assertion - non-alive processes are in SZOMB state now
fixes PR kern/24033 by Martin Husemann
 1.17  18-Jul-2003  fvdl Unlock kq_lock in the case of a timeout.
 1.16  29-Jun-2003  fvdl branches: 1.16.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.15  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.14  23-Jun-2003  jdolecek add __KERNEL_RCSID()
 1.13  21-Mar-2003  dsl Change 'data' argument to fo_ioctl and fo_fcntl from 'caddr_t' to 'void *'.
Avoids a lot of casting and removes the need for some line breaks.
Removed a load of (caddr_t) casts from calls to copyin/copyout as well.
(approved by christos - he has a plan to remove caddr_t...)
 1.12  23-Feb-2003  pk Protect the event queue with a simple mutex; this only partially addresses
MP-safety issues in the event handling system.
 1.11  23-Feb-2003  pk Use splsched() instead of splhigh() to protect the triggered event queues.
 1.10  23-Feb-2003  pk Make updating a file's reference and use count MP-safe.
 1.9  21-Feb-2003  jdolecek simplify timeout handling code in kqueue_scan()
 1.8  04-Feb-2003  jdolecek Introduce EVFILT_TIMER, which allows a process to establish an
arbitrary number of timers, both oneshot and periodic.

from FreeBSD, only adapted to NetBSD kernel API - mstohz() instead
of tvtohz(), and takes advantage of callout_schedule() in filt_timerexpire()
 1.7  01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.6  18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.5  26-Nov-2002  christos branches: 1.5.2;
si_ -> sel_ to avoid conflicts with siginfo.
 1.4  08-Nov-2002  jdolecek branches: 1.4.2;
kevent(2): if the specified timeout is >=1ns and <1us, perform a poll
rather than waiting forever due to TIMESPEC_TO_TIMEVAL() conversion
 1.3  23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.2  10-Jul-2001  lukem move to kqueue branch for now
 1.1  06-Jul-2001  lukem branches: 1.1.1;
Initial revision
 1.1.1.1  06-Jul-2001  lukem branches: 1.1.1.1.2;
freebsd kqueue implementation
 1.1.1.1.2.18  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.1.1.1.2.17  02-Oct-2002  jdolecek Fix the error path of NOTE_FORK|NOTE_TRACK handling code to not cause kernel
panic.
Problem reported and fix provided by Peter Werner <Peter.Werner at wgsn dot com>
 1.1.1.1.2.16  01-Oct-2002  jdolecek filt_procattach() when attaching knote to a process, check that the current
process has either same uid, or is run by superuser; this fixes botch
in the import of the code to NetBSD, when the permissions check was removed
from original FreeBSD code

Brought to my attention by report of diffent issue by
Peter Werner <Peter.Werner at wgsn dot com>.

Actual code for the check taken from OpenBSD; FreeBSD version of the check
is different enough to not be directly usable.
 1.1.1.1.2.15  18-Sep-2002  jdolecek user_kqfilters[] could be NULL if no extra filter were registered; check
this in kfilter_byname_user()
 1.1.1.1.2.14  07-Jun-2002  jdolecek kevent(2): change type of 'nchanges' and 'nevents' from int to size_t
change discussed on bsd-api list
 1.1.1.1.2.13  09-Apr-2002  jdolecek Make sure knote is detached from watched process when it exits, to avoid
using already-freed memory under some circumstancies.

(this bug was not NetBSD specific)
 1.1.1.1.2.12  09-Apr-2002  jdolecek g/c KFILTER_UNREGISTER, KFILTER_REGISTER
 1.1.1.1.2.11  17-Mar-2002  jdolecek some more comment indentation fixes
kfilter_register(): use free() instead of FREE() - the memory is allocated
using malloc(), not MALLOC()
filt_procdetach(): fix the KASSERT() to not fire spuriously
 1.1.1.1.2.10  16-Mar-2002  jdolecek Catch up with -current.
 1.1.1.1.2.9  15-Mar-2002  jdolecek no need to 'lock' the process in filt_procattach()/filt_procdetach() for now
(XXXSMP comment added)
kqueue_poll(): no need to protect with splnet, KNOTE() is never called
from interrupt context so far
knote_attach():
* don't memcpy() old fdp->fd_knlist if it's NULL
* use free() instead of FREE() for old fdp->fd_knlist - the memory
is allocated using malloc(9)
make the indentation of comments less insane (brrrr)
g/c bogus ARGSUSED
add couple KASSERT()s
 1.1.1.1.2.8  21-Feb-2002  jdolecek you ought to use FREE(), not free() when you use MALLOC()
sync comment for KFILTER_REGISTER/KFILTER_UNREGISTER with <sys/event.h>
 1.1.1.1.2.7  08-Sep-2001  thorpej Add a filter which simulates seltrue() by setting kn->kn_data to 0,
and then saying "event is active".
 1.1.1.1.2.6  08-Sep-2001  thorpej Add a selnotify(), which does a selwakeup() + KNOTE(), rather than
requiring all callers to do both.

This may be a transitional step only, or it may stick. I haven't
decided yet.
 1.1.1.1.2.5  07-Sep-2001  thorpej More const.
 1.1.1.1.2.4  07-Sep-2001  thorpej Sprinkle some const, and make a very tiny optimization to the lookup
of system-provided filters (the common case).
 1.1.1.1.2.3  07-Sep-2001  thorpej Remove a needless extra function call when allocating/freeing knotes.
 1.1.1.1.2.2  07-Sep-2001  thorpej Use a pool for kqueue structures.
 1.1.1.1.2.1  10-Jul-2001  lukem * update for differences between netbsd & freebsd WRT:
- header files
- struct locking
- pool (& other memory) allocation
- timeouts
* add kqueue_ioctl(), to support ioctl(2) operations on a kqueue fd.
* change the way that system filters are referenced to support name lookups
and unimplemented (yet known about) filter types (such as EVFILT_AIO)
* add kfilter_register(9): register filter with given name to map to given
filterops. filter must not exist
* add kfilter_unregister(9): unregister user filter
* add kfilter_byname(): lookup filter by name, which can be a system filter
or a user-added filter
* add kfilter_byfilter(): lookup filter by filter number
* in kqueue_register(), use kfilter_byfilter() to determine filterops rather
than using obscure ~ operation (which was now incorrect due to renumbering)
* cleanup whitespace, improve comments
* check return code of all copyouts()s
 1.4.2.5  11-Dec-2002  thorpej Sync with HEAD.
 1.4.2.4  12-Nov-2002  nathanw Include <sys/sa.h> for types for <sys/syscallargs.h>.
 1.4.2.3  12-Nov-2002  skrll LWPify.
 1.4.2.2  11-Nov-2002  nathanw Catch up to -current
 1.4.2.1  08-Nov-2002  nathanw file kern_event.c was added on branch nathanw_sa on 2002-11-11 22:13:38 +0000
 1.5.2.1  18-Dec-2002  gmcgarry Merge pcred and ucred, and poolify. TBD: check backward compatibility
and factor-out some higher-level functionality.
 1.16.2.9  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.16.2.8  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.16.2.7  24-Feb-2005  skrll Reduce diff to HEAD
 1.16.2.6  18-Dec-2004  skrll Sync with HEAD.
 1.16.2.5  21-Sep-2004  skrll Fix the sync with head I botched.
 1.16.2.4  18-Sep-2004  skrll Sync with HEAD.
 1.16.2.3  10-Aug-2004  skrll Reduce diff to HEAD
 1.16.2.2  03-Aug-2004  skrll Sync with HEAD
 1.16.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.21.6.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.21.4.1  29-Apr-2005  kent sync with -current
 1.23.4.1  26-Oct-2005  yamt sync with head
 1.23.2.11  24-Mar-2008  yamt sync with head.
 1.23.2.10  17-Mar-2008  yamt sync with head.
 1.23.2.9  27-Feb-2008  yamt sync with head.
 1.23.2.8  04-Feb-2008  yamt sync with head.
 1.23.2.7  21-Jan-2008  yamt sync with head
 1.23.2.6  07-Dec-2007  yamt sync with head
 1.23.2.5  27-Oct-2007  yamt sync with head.
 1.23.2.4  03-Sep-2007  yamt sync with head.
 1.23.2.3  26-Feb-2007  yamt sync with head.
 1.23.2.2  30-Dec-2006  yamt sync with head.
 1.23.2.1  21-Jun-2006  yamt sync with head.
 1.25.12.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.25.10.4  11-May-2006  elad sync with head
 1.25.10.3  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.25.10.2  10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.25.10.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.25.8.3  11-Aug-2006  yamt sync with head
 1.25.8.2  26-Jun-2006  yamt sync with head.
 1.25.8.1  24-May-2006  yamt sync with head.
 1.25.6.3  01-Jun-2006  kardel Sync with head.
 1.25.6.2  22-Apr-2006  simonb Sync with head.
 1.25.6.1  04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.25.4.1  09-Sep-2006  rpaulo sync with head
 1.27.2.1  19-Jun-2006  chap Sync with head.
 1.30.6.2  10-Dec-2006  yamt sync with head.
 1.30.6.1  22-Oct-2006  yamt sync with head
 1.30.4.4  30-Jan-2007  ad Remove support for SA. Ok core@.
 1.30.4.3  12-Jan-2007  ad Sync with head.
 1.30.4.2  29-Dec-2006  ad Checkpoint work in progress.
 1.30.4.1  18-Nov-2006  ad Sync with head.
 1.33.8.1  19-Nov-2011  bouyer Pull up following revision(s) (requested by christos in ticket #1438):
sys/kern/kern_event.c: revision 1.73
PR/45618: Motoyuki OHMORI: kqueue EVFILT_TIMER with smaller timeout value
makes DIAGNOSTIC kernel panic:
KASSERT((c->c_flags & CALLOUT_PENDING) !=3D 0);
If the computed ticks are <= 0 set it to 1
 1.33.2.1  19-Nov-2011  bouyer Pull up following revision(s) (requested by christos in ticket #1438):
sys/kern/kern_event.c: revision 1.73
PR/45618: Motoyuki OHMORI: kqueue EVFILT_TIMER with smaller timeout value
makes DIAGNOSTIC kernel panic:
KASSERT((c->c_flags & CALLOUT_PENDING) !=3D 0);
If the computed ticks are <= 0 set it to 1
 1.35.2.3  24-Mar-2007  yamt sync with head.
 1.35.2.2  12-Mar-2007  rmind Sync with HEAD.
 1.35.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.37.2.4  20-Aug-2007  ad Sync with HEAD.
 1.37.2.3  01-Jul-2007  ad Adapt to callout API change.
 1.37.2.2  21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.37.2.1  13-Mar-2007  ad Sync with head.
 1.38.2.1  11-Jul-2007  mjf Sync with head.
 1.39.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.40.10.2  21-Jul-2007  ad +#include <sys/conf.h>
 1.40.10.1  21-Jul-2007  ad file kern_event.c was added on branch matt-mips64 on 2007-07-21 19:23:04 +0000
 1.40.8.1  14-Oct-2007  yamt sync with head.
 1.40.6.3  23-Mar-2008  matt sync with HEAD
 1.40.6.2  09-Jan-2008  matt sync with HEAD
 1.40.6.1  06-Nov-2007  matt sync with HEAD
 1.40.4.2  09-Dec-2007  jmcneill Sync with HEAD.
 1.40.4.1  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.41.4.3  18-Feb-2008  mjf Sync with HEAD.
 1.41.4.2  27-Dec-2007  mjf Sync with HEAD.
 1.41.4.1  08-Dec-2007  mjf Sync with HEAD.
 1.42.2.2  26-Dec-2007  ad Sync with head.
 1.42.2.1  08-Dec-2007  ad Sync with head.
 1.43.4.3  23-Jan-2008  bouyer Sync with HEAD.
 1.43.4.2  08-Jan-2008  bouyer Sync with HEAD
 1.43.4.1  02-Jan-2008  bouyer Sync with HEAD
 1.47.6.4  17-Jan-2009  mjf Sync with HEAD.
 1.47.6.3  29-Jun-2008  mjf Sync with HEAD.
 1.47.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.47.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.47.2.1  24-Mar-2008  keiichi sync with head.
 1.53.4.1  18-May-2008  yamt sync with head.
 1.53.2.2  01-Nov-2008  christos Sync with head.
 1.53.2.1  29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.57.2.6  09-Oct-2010  yamt sync with head
 1.57.2.5  11-Aug-2010  yamt sync with head.
 1.57.2.4  11-Mar-2010  yamt sync with head
 1.57.2.3  20-Jun-2009  yamt sync with head
 1.57.2.2  04-May-2009  yamt sync with head.
 1.57.2.1  16-May-2008  yamt sync with head.
 1.59.4.1  27-Jun-2008  simonb Sync with head.
 1.59.2.3  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.59.2.2  14-May-2008  wrstuden Per discussion with ad, remove most of the #include <sys/sa.h> lines
as they were including sa.h just for the type(s) needed for syscallargs.h.

Instead, create a new file, sys/satypes.h, which contains just the
types needed for syscallargs.h. Yes, there's only one now, but that
may change and it's probably more likely to change if it'd be difficult
to handle. :-)

Per discussion with matt at n dot o, add an include of satypes.h to
sigtypes.h. Upcall handlers are kinda signal handlers, and signalling
is the header file that's already included for syscallargs.h that
closest matches SA.

This shaves about 3000 lines off of the diff of the branch relative
to the base. That also represents about 18% of the total before this
checkin.

I think this reduction is very good thing.
 1.59.2.1  10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.60.6.4  19-Nov-2011  sborrill Pull up the following revisions(s) (requested by rmind in ticket #1695):
sys/kern/kern_event.c: revision 1.74

kqueue_register: avoid calling fd_getfile() with filedesc_t::fd_lock held.
Fixes PR/45479 by KOGULE Ryo.
 1.60.6.3  18-Nov-2011  sborrill Pull up the following revisions(s) (requested by christos in ticket #1693):
sys/kern/kern_event.c: revision 1.73

PR/45618: Motoyuki OHMORI: kqueue EVFILT_TIMER with smaller timeout value
makes DIAGNOSTIC kernel panic. If the computed ticks are <= 0 set it to 1.
 1.60.6.2  09-Jan-2010  snj branches: 1.60.6.2.2;
Pull up following revision(s) (requested by dsl in ticket #1208):
sys/kern/kern_event.c: revision 1.69
Use sizeof correct type, not pointer to wrong type.
Fixes PR/42498.
This has been wrong since the initial import!
 1.60.6.1  04-Apr-2009  snj branches: 1.60.6.1.2; 1.60.6.1.4;
Pull up following revision(s) (requested by ad in ticket #661):
sys/arch/xen/xen/xenevt.c: revision 1.32
sys/compat/svr4/svr4_net.c: revision 1.56
sys/compat/svr4_32/svr4_32_net.c: revision 1.19
sys/dev/dmover/dmover_io.c: revision 1.32
sys/dev/putter/putter.c: revision 1.21
sys/kern/kern_descrip.c: revision 1.190
sys/kern/kern_drvctl.c: revision 1.23
sys/kern/kern_event.c: revision 1.64
sys/kern/sys_mqueue.c: revision 1.14
sys/kern/sys_pipe.c: revision 1.109
sys/kern/sys_socket.c: revision 1.59
sys/kern/uipc_syscalls.c: revision 1.136
sys/kern/vfs_vnops.c: revision 1.164
sys/kern/uipc_socket.c: revision 1.188
sys/net/bpf.c: revision 1.144
sys/net/if_tap.c: revision 1.55
sys/opencrypto/cryptodev.c: revision 1.47
sys/sys/file.h: revision 1.67
sys/sys/param.h: patch
sys/sys/socketvar.h: revision 1.119
Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.
Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.
thr0 accept(fd, ...)
thr1 close(fd)
 1.60.6.2.2.1  19-Nov-2011  sborrill Pull up the following revisions(s) (requested by rmind in ticket #1695):
sys/kern/kern_event.c: revision 1.74

kqueue_register: avoid calling fd_getfile() with filedesc_t::fd_lock held.
Fixes PR/45479 by KOGULE Ryo.
 1.60.6.1.4.1  21-Apr-2010  matt sync to netbsd-5
 1.60.6.1.2.2  19-Nov-2011  sborrill Pull up the following revisions(s) (requested by rmind in ticket #1695):
sys/kern/kern_event.c: revision 1.74

kqueue_register: avoid calling fd_getfile() with filedesc_t::fd_lock held.
Fixes PR/45479 by KOGULE Ryo.
 1.60.6.1.2.1  09-Jan-2010  snj Pull up following revision(s) (requested by dsl in ticket #1208):
sys/kern/kern_event.c: revision 1.69
Use sizeof correct type, not pointer to wrong type.
Fixes PR/42498.
This has been wrong since the initial import!
 1.60.4.2  28-Apr-2009  skrll Sync with HEAD.
 1.60.4.1  19-Jan-2009  skrll Sync with HEAD.
 1.61.2.2  23-Jul-2009  jym Sync with HEAD.
 1.61.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.69.4.2  05-Mar-2011  rmind sync with head
 1.69.4.1  03-Jul-2010  rmind sync with head
 1.69.2.2  22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.69.2.1  17-Aug-2010  uebayasi Sync with HEAD.
 1.72.2.3  16-Jan-2013  yamt sync with (a bit old) head
 1.72.2.2  30-Oct-2012  yamt sync with head
 1.72.2.1  17-Apr-2012  yamt sync with head
 1.74.4.1  18-Feb-2012  mrg merge to -current.
 1.75.6.1  24-Nov-2012  jdc Pull up revisions:
src/sys/kern/kern_event.c revision 1.79
src/sys/kern/kern_descrip.c revision 1.219
src/lib/libc/sys/kqueue.2 revision 1.33
src/tests/lib/libc/sys/t_kevent.c revision 1.2-1.5
(requested by christos in ticket #716).

- initialize kn_id
- in close, invalidate f_data and f_type early to prevent accidental re-use
- add a DIAGNOSTIC for when we use unsupported fd's and a KASSERT for f_event
being NULL.

Return EOPNOTSUPP for fnullop_kqfilter to prevent registration of unsupported
fds. XXX: We should really fix the fd's to be supported in the future.
Unsupported fd's have a NULL f_event, so registering crashes the kernel with
a NULL function dereference of f_event.

mention that kevent returns now EOPNOTSUPP.

Move the references to PRs from code comments to the test description. Once
ATF has the ability to output the metadata in the HTML reports, it should be
easy to traverse between releng and gnats -reports via links.

Add a (skipped for now) test case for PR 46463

adapt to new reality

Add a test for adding an event to an unsupported fd.
 1.75.2.1  24-Nov-2012  jdc Pull up revisions:
src/sys/kern/kern_event.c revision 1.79
src/sys/kern/kern_descrip.c revision 1.219
src/lib/libc/sys/kqueue.2 revision 1.33
src/tests/lib/libc/sys/t_kevent.c revision 1.2-1.5
(requested by christos in ticket #716).

- initialize kn_id
- in close, invalidate f_data and f_type early to prevent accidental re-use
- add a DIAGNOSTIC for when we use unsupported fd's and a KASSERT for f_event
being NULL.

Return EOPNOTSUPP for fnullop_kqfilter to prevent registration of unsupported
fds. XXX: We should really fix the fd's to be supported in the future.
Unsupported fd's have a NULL f_event, so registering crashes the kernel with
a NULL function dereference of f_event.

mention that kevent returns now EOPNOTSUPP.

Move the references to PRs from code comments to the test description. Once
ATF has the ability to output the metadata in the HTML reports, it should be
easy to traverse between releng and gnats -reports via links.

Add a (skipped for now) test case for PR 46463

adapt to new reality

Add a test for adding an event to an unsupported fd.
 1.76.2.4  03-Dec-2017  jdolecek update from HEAD
 1.76.2.3  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.76.2.2  25-Feb-2013  tls resync with head
 1.76.2.1  20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.79.10.1  10-Aug-2014  tls Rebase.
 1.80.2.3  21-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1653):

sys/kern/kern_event.c: revision 1.104

Fix kernel info leak. There are 4 bytes of padding in struct kevent.
[ 287.537676] kleak: Possible leak in copyout: [len=40, leaked=4]
[ 287.537676] #0 0xffffffff80b7c41a in kleak_note <netbsd>
[ 287.547673] #1 0xffffffff80b7c49a in kleak_copyout <netbsd>
[ 287.557677] #2 0xffffffff80b1d32d in kqueue_scan.isra.1.constprop.2 <netbsd>
[ 287.557677] #3 0xffffffff80b1dc6a in kevent1 <netbsd>
[ 287.567683] #4 0xffffffff80b1dcb0 in sys___kevent50 <netbsd>
[ 287.567683] #5 0xffffffff8025ab3c in sy_call <netbsd>
[ 287.577688] #6 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 287.587693] #7 0xffffffff8025adf4 in syscall <netbsd>
 1.80.2.2  08-Jul-2017  snj Pull up following revision(s) (requested by christos in ticket #1442):
sys/kern/kern_event.c: revision 1.92 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.198 via patch
sys/sys/event.h: revision 1.30 via patch
Provide EVFILT_WRITE; this is what FreeBSD does and go wants it.
Makes go unit tests pass.
--
fix file descriptor locking (from joerg).
fixes kernel crashes by running go
 1.80.2.1  14-Apr-2015  snj branches: 1.80.2.1.2; 1.80.2.1.6;
Pull up following revision(s) (requested by christos in ticket #677):
lib/libc/sys/kqueue.2: revision 1.34
sys/kern/kern_event.c: revision 1.83
put the exit code of the process in data, like FreeBSD does.
--
say that we put the exit code in data.
 1.80.2.1.6.2  21-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1653):

sys/kern/kern_event.c: revision 1.104

Fix kernel info leak. There are 4 bytes of padding in struct kevent.
[ 287.537676] kleak: Possible leak in copyout: [len=40, leaked=4]
[ 287.537676] #0 0xffffffff80b7c41a in kleak_note <netbsd>
[ 287.547673] #1 0xffffffff80b7c49a in kleak_copyout <netbsd>
[ 287.557677] #2 0xffffffff80b1d32d in kqueue_scan.isra.1.constprop.2 <netbsd>
[ 287.557677] #3 0xffffffff80b1dc6a in kevent1 <netbsd>
[ 287.567683] #4 0xffffffff80b1dcb0 in sys___kevent50 <netbsd>
[ 287.567683] #5 0xffffffff8025ab3c in sy_call <netbsd>
[ 287.577688] #6 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 287.587693] #7 0xffffffff8025adf4 in syscall <netbsd>
 1.80.2.1.6.1  08-Jul-2017  snj Pull up following revision(s) (requested by christos in ticket #1442):
sys/kern/kern_event.c: revision 1.92 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.198 via patch
sys/sys/event.h: revision 1.30 via patch
Provide EVFILT_WRITE; this is what FreeBSD does and go wants it.
Makes go unit tests pass.
--
fix file descriptor locking (from joerg).
fixes kernel crashes by running go
 1.80.2.1.2.2  21-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1653):

sys/kern/kern_event.c: revision 1.104

Fix kernel info leak. There are 4 bytes of padding in struct kevent.
[ 287.537676] kleak: Possible leak in copyout: [len=40, leaked=4]
[ 287.537676] #0 0xffffffff80b7c41a in kleak_note <netbsd>
[ 287.547673] #1 0xffffffff80b7c49a in kleak_copyout <netbsd>
[ 287.557677] #2 0xffffffff80b1d32d in kqueue_scan.isra.1.constprop.2 <netbsd>
[ 287.557677] #3 0xffffffff80b1dc6a in kevent1 <netbsd>
[ 287.567683] #4 0xffffffff80b1dcb0 in sys___kevent50 <netbsd>
[ 287.567683] #5 0xffffffff8025ab3c in sy_call <netbsd>
[ 287.577688] #6 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 287.587693] #7 0xffffffff8025adf4 in syscall <netbsd>
 1.80.2.1.2.1  08-Jul-2017  snj Pull up following revision(s) (requested by christos in ticket #1442):
sys/kern/kern_event.c: revision 1.92 via patch
sys/miscfs/genfs/genfs_vnops.c: revision 1.198 via patch
sys/sys/event.h: revision 1.30 via patch
Provide EVFILT_WRITE; this is what FreeBSD does and go wants it.
Makes go unit tests pass.
--
fix file descriptor locking (from joerg).
fixes kernel crashes by running go
 1.82.2.6  28-Aug-2017  skrll Sync with HEAD
 1.82.2.5  05-Oct-2016  skrll Sync with HEAD
 1.82.2.4  22-Apr-2016  skrll Sync with HEAD
 1.82.2.3  19-Mar-2016  skrll Sync with HEAD
 1.82.2.2  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.82.2.1  06-Apr-2015  skrll Sync with HEAD
 1.88.8.4  19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.88.8.3  17-May-2017  pgoyette Import fix from HEAD for NULL deref
 1.88.8.2  11-May-2017  pgoyette Sync with HEAD
 1.88.8.1  02-May-2017  pgoyette Sync with HEAD - tag prg-localcount2-base1
 1.91.2.3  21-Nov-2018  martin Pull up following revision(s) (requested by maxv in ticket #1102):

sys/kern/kern_event.c: revision 1.104

Fix kernel info leak. There are 4 bytes of padding in struct kevent.
[ 287.537676] kleak: Possible leak in copyout: [len=40, leaked=4]
[ 287.537676] #0 0xffffffff80b7c41a in kleak_note <netbsd>
[ 287.547673] #1 0xffffffff80b7c49a in kleak_copyout <netbsd>
[ 287.557677] #2 0xffffffff80b1d32d in kqueue_scan.isra.1.constprop.2 <netbsd>
[ 287.557677] #3 0xffffffff80b1dc6a in kevent1 <netbsd>
[ 287.567683] #4 0xffffffff80b1dcb0 in sys___kevent50 <netbsd>
[ 287.567683] #5 0xffffffff8025ab3c in sy_call <netbsd>
[ 287.577688] #6 0xffffffff8025ad6e in sy_invoke <netbsd>
[ 287.587693] #7 0xffffffff8025adf4 in syscall <netbsd>
 1.91.2.2  16-Jan-2018  martin Pull up following revision(s) (requested by christos in ticket #501):
sys/kern/kern_event.c: revision 1.103
Set EV_ONESHOT to prevent rescheduling
XXX: pullup-8
 1.91.2.1  05-Jul-2017  snj Pull up following revision(s) (requested by christos in ticket #91):
sys/kern/kern_event.c: revision 1.92
sys/miscfs/genfs/genfs_vnops.c: revision 1.198
sys/sys/event.h: revision 1.30
Provide EVFILT_WRITE; this is what FreeBSD does and go wants it.
Makes go unit tests pass.
--
fix file descriptor locking (from joerg).
fixes kernel crashes by running go
 1.103.4.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.103.4.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.103.4.1  10-Jun-2019  christos Sync with HEAD
 1.103.2.1  26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.104.4.2  07-Feb-2021  martin Apply additional patch, requested by jdolecek in ticket #1191:

sys/kern/kern_event.c 1.110-1.115 (via patch)

Fix merge botch for the EV_ONESHOT branch.
 1.104.4.1  04-Feb-2021  martin Pullup the following (requested by jdolecek in ticket #1191):

sys/kern/kern_event.c r1.110-1.115 (via patch)

fix a race in kqueue_scan() - when multiple threads check the same
kqueue, it could happen other thread seen empty kqueue while kevent
was being checked for re-firing and re-queued

make sure to keep retrying if there are outstanding kevents even
if no kevent is found on first pass through the queue, and only
kq_count when actually completely done with the kevent

PR kern/50094 by Christof Meerwal

Also fixes timer latency in Go, as reported in
https://github.com/golang/go/issues/42515 by Michael Pratt
 1.105.2.1  29-Feb-2020  ad Sync with head.
 1.108.2.3  03-Apr-2021  thorpej Sync with HEAD.
 1.108.2.2  03-Jan-2021  thorpej Sync w/ HEAD.
 1.108.2.1  14-Dec-2020  thorpej Sync w/ HEAD.
 1.117.4.1  13-May-2021  thorpej Sync with HEAD.

RSS XML Feed