Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/sys_select.c
RevisionDateAuthorComments
 1.68  26-Nov-2024  khorben Typo in a comment
 1.67  18-Oct-2024  kre PR kern/57504 : Check all fds passed in to select

If an application passes in a huge fd_set (select(BIG, ...))
then check every bit in the fd_sets provided, to make sure
they are valid.

If BIG is too big (cannot possibly represent an open fd for
this process, under any circumstances: ie: not just because
that many are not currently open) return EINVAL.

Otherwise, check every set bit to make sure it is valid. Any
fd bits set above the applications current highest open fd
automatically generate EBADF and quick(ish) exit.

fd's that are within the plausible range are then checked as
they always were (it is possible for there to be a few there
above the max open fd - as everything in select is done in
multiples of __FDBITS (fd_mask) but the max open fd is not so
constrained. Those always were checked, continue using the
same mechanism.

This should have zero impact on any sane application which
uses the highest fd for which it set a bit, +1, as the first
arg to select. However, if there are any broken applications
that were relying upon the previous behaviour of simply ignoring
any fd_masks that started beyond the max number of open files,
then they might (if they happen to have any bits set) now fail.

XXX pullup -10 -- but not for a long time. Someone remind me
sometime next year. Leave a long settling time in HEAD just to
be sure no issues arise, as in practice, almost nothing should
cause any of the new code to be executed.

pullup -9 -- probably not, what this fixes isn't significant
enough to bother going that far back for (IMO).
 1.66  15-Oct-2023  riastradh branches: 1.66.6;
sys_select.c: Sort includes. No functional change intended.
 1.65  15-Oct-2023  riastradh sys/lwp.h: Nix sys/syncobj.h dependency.

Remove it in ddb/db_syncobj.h too.

New sys/wchan.h defines wchan_t so that users need not pull in
sys/syncobj.h to get it.

Sprinkle #include <sys/syncobj.h> in .c files where it is now needed.
 1.64  08-Oct-2023  ad Ensure that an LWP that has taken a legitimate wakeup never produces an
error code from sleepq_block(). Then, it's possible to make cv_signal()
work as expected and only ever wake a singular LWP.
 1.63  04-Oct-2023  ad Eliminate l->l_biglocks. Originally I think it had a use but these days a
local variable will do.
 1.62  23-Sep-2023  ad - Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).
 1.61  17-Jul-2023  riastradh kern: New struct syncobj::sobj_name member for diagnostics.

XXX potential kernel ABI change -- not sure any modules actually use
struct syncobj but it's hard to rule that out because sys/syncobj.h
leaks into sys/lwp.h
 1.60  29-Jun-2022  riastradh branches: 1.60.4;
sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
 1.59  09-Apr-2022  riastradh select(9): Use membar_acquire/release and atomic_store_release.

No store-before-load ordering here -- this was obviously always
intended to be load-before-load/store all along.
 1.58  12-Feb-2022  thorpej Add inline functions to manipulate the klists that link up knotes
via kn_selnext:

- klist_init()
- klist_fini()
- klist_insert()
- klist_remove()

These provide some API insulation from the implementation details of these
lists (but not completely; see vn_knote_attach() and vn_knote_detach()).
Currently just a wrapper around SLIST(9).

This will make it significantly easier to switch kn_selnext linkage
to a different kind of list.
 1.57  10-Dec-2021  andvar s/occured/occurred/ in comments, log messages and man pages.
 1.56  29-Sep-2021  thorpej - Change selremove_knote() from returning void to bool, and return
true if the last knote was removed and there are no more knotes
on the selinfo.
- Use this new return value in filt_sordetach(), filt_sowdetach(),
filt_fifordetach(), and filt_fifowdetach() to know when to clear
SB_KOTE without having to know select/kqueue implementation details.
 1.55  11-Dec-2020  thorpej Add sel{record,remove}_knote(), so hide some of the details surrounding
knote / kevent registration in the selinfo structure.
 1.54  19-Apr-2020  ad branches: 1.54.2;
Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).
 1.53  26-Mar-2020  ad branches: 1.53.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.
 1.52  15-Feb-2020  ad - List all of the syncobjs in syncobj.h.
- Update a comment.
 1.51  01-Feb-2020  riastradh Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:

- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.
 1.50  22-Nov-2019  ad branches: 1.50.2;
Minor correction to previous.
 1.49  21-Nov-2019  ad Minor improvements to select/poll:

- Increase the maximum number of clusters from 32 to 64 for large systems.
kcpuset_t could potentially be used here but that's an excursion I don't
want to go on right now. uint32_t -> uint64_t is very simple.

- In the case of a non-blocking select/poll, or where we won't block
because there are events ready to report, stop registering interest in
the back-end objects early.

- Change the wmesg for poll back to "poll".
 1.48  20-Sep-2019  kamil Validate usec ranges in sys___select50()

Later in the code selcommon() checks for proper timespec, check only
correct usec of timeval before type conversions.
 1.47  20-Aug-2019  msaitoh Use unsigned to avoid undefined behavior. Found by kUBSan.
 1.46  26-Jul-2019  msaitoh branches: 1.46.2;
Set sc_mask correctly in selsysinit() to avoid undefined behavior.
Found by KUBSan.
 1.45  08-May-2019  christos Add slop of 1000 and explain why.
 1.44  07-May-2019  christos Use the max limit (aka maxfiles or the moral equivalent of OPEN_MAX) which
makes poll(2) align with the Posix documentation (which allows EINVAL if
nfds > OPEN_MAX). From: Anthony Mallet
 1.43  05-May-2019  christos Remove the slop code. Suggested by mrg@
 1.42  04-May-2019  christos PR/54158: Anthony Mallet: poll(2) does not allow polling all possible fds
(hardcoded limit to 1000 + #<open-fds>). Changed to limit by the max of
the resource limit of open descriptors and the above.
 1.41  30-Jan-2018  ozaki-r branches: 1.41.4;
Apply C99-style struct initialization to syncobj_t
 1.40  01-Jun-2017  chs branches: 1.40.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.39  25-Apr-2014  pooka branches: 1.39.4;
Remove pollsock(). Since it took only a single socket, it was essentially
a complicated way to call soreceive() with a sb_timeo. The only user
(netsmb) already did that anyway, so just had to delete the call to
pollsock().
 1.38  25-Feb-2014  pooka branches: 1.38.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.37  26-Jan-2013  riastradh branches: 1.37.2;
Assert equality, not assignment, in selrecord.

Code inspection suggests that this fix is not likely to reveal any
latent problems.
 1.36  29-Aug-2011  rmind branches: 1.36.2; 1.36.12;
Add kern.direct_select sysctl. Default to 0 for now.
 1.35  09-Aug-2011  hannken No need to lock the selcluster in selscan() if either
NO_DIRECT_SELECT is defined or all polls return an event.
 1.34  06-Aug-2011  hannken Fix the races of direct select()/poll():

- When sel_do_scan() restarts do a full initialization with selclear() so
we start from an empty set without registered events. Defer the
evaluation of l_selret after selclear() and add the count of direct events
to the count of events.

- For selscan()/pollscan() zero the output descriptors before we poll and
for selscan() take the sc_lock before we change them.

- Change sel_setevents() to not count events already set.

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>

Should fix PR #44763 (select/poll direct-set optimization seems racy)
and PR #45187 (select(2) sometimes doesn't wakeup)
 1.33  28-May-2011  christos If a signal did not fire, restore the original signal mask for pselect/pollts
using a signal mask. Tested by tron.
 1.32  18-May-2011  christos No need to mask twice. The setup function does it.
 1.31  18-May-2011  christos PR/43625: Mark Davies: Fix pselect(2) to honor the temporary mask. pselect(2)
(and pollts(2)) are similar to sigsuspend(2) in that they temporarily change
the process signal mask and wait for signal delivery. Factor out and share the
code that does this.
 1.30  06-Mar-2011  rmind In a case of direct select, set only masked events, do not wakeup LWP
if no polled/selected events were set; also, count the correct return
value for the select.
 1.29  18-Dec-2010  rmind branches: 1.29.2;
- Fix a few possible locking issues in execve1() and exit1(). Add a note
that scheduler locks are special in this regard - adaptive locks cannot
be in the path due to turnstiles. Randomly spotted/reported by uebayasi@.
- Remove unused lwp_relock() and replace lwp_lock_retry() by simplifying
lwp_lock() and sleepq_enter() a little.
- Give alllwp its own cache-line and mark lwp_cache pointer as read-mostly.

OK ad@
 1.28  15-Oct-2010  rmind Re-enable direct select.
 1.27  12-Jul-2010  rmind sel_setevents: fix error - match event-set, as intended.
Spotted by Enami Tsugutomo.
 1.26  11-Jul-2010  rmind Disable direct select for now, since it still brings problems.
 1.25  10-Jul-2010  rmind sel_setevents: fix direct injecting of fd bit for select() case.
 1.24  08-Jul-2010  rmind sel_do_scan: do not bother to assert for SEL_SCANNING state before blocking,
as it might also be SEL_BLOCKING due to spurious wake-ups. That has no harm.
 1.23  08-Jul-2010  rmind Implement direct select/poll support, currently effective for socket and
pipe subsystems. Avoids overhead of second selscan() on wake-up, and thus
improves performance on certain workloads (especially when polling on many
file-descriptors). Also, clean-up sys/fd_set.h header and improve macros.

Welcome to 5.99.36!
 1.22  25-Apr-2010  ad Make select/poll work with more than 32 CPUs.
No ABI change.
 1.21  20-Dec-2009  rmind branches: 1.21.2; 1.21.4;
Add comment about locking.
 1.20  12-Dec-2009  dsl Bounding the 'nfds' arg to poll() at the current process limit for actual
open files is rather gross - the poll map isn't required to be dense.
Instead limit to a much larger value (1000 + dt_nfiles) so that user
programs cannot allocate indefinite sized blocks of kvm.
If the limit is exceeded, then return EINVAL instead of silently truncating
the list.
(The silent truncation in select isn't quite as bad - although even there
any high bits that are set ought to generate an EBADF response.)
Move the code that converts ERESTART and EWOULDBLOCK into common code.
Effectively fixes PR/17507 since the new limit is unlikely to be detected.
 1.19  11-Nov-2009  rmind - selcommon/pollcommon: drop redundant l argument.
- Use cached curlwp->l_fd, instead of p->p_fd.
- Inline selscan/pollscan.
 1.18  01-Nov-2009  rmind - Move inittimeleft() and gettimeleft() to subr_time.c, where they belong.
- Move abstimeout2timo() there too and export. Use it in lwp_park().
 1.17  01-Nov-2009  rmind Move common logic in selcommon() and pollcommon() into sel_do_scan().
Avoids code duplication. XXX: pollsock() should be converted too, except
it's a bit ugly.
 1.16  21-Oct-2009  rmind Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.15  24-May-2009  ad More changes to improve kern_descrip.c.

- Avoid atomics in more places.
- Remove the per-descriptor mutex, and just use filedesc_t::fd_lock.
It was only being used to synchronize close, and in any case we needed
to take fd_lock to free the descriptor slot.
- Optimize certain paths for the <NDFDFILE case.
- Sprinkle more comments and assertions.
- Cache more stuff in filedesc_t.
- Fix numerous minor bugs spotted along the way.
- Restructure how the open files array is maintained, for clarity and so
that we can eliminate the membar_consumer() call in fd_getfile(). This is
mostly syntactic sugar; the main functional change is that fd_nfiles now
lives alongside the open file array.

Some measurements with libmicro:

- simple file syscalls are like close() are between 1 to 10% faster.
- some nice improvements, e.g. poll(1000) which is ~50% faster.
 1.14  29-Mar-2009  christos Move the internal poll/select related API's to use timespec instead
of timeval (rides the uvm bump).
 1.13  21-Mar-2009  ad Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.
 1.12  11-Jan-2009  christos branches: 1.12.2;
merge christos-time_t
 1.11  20-Nov-2008  yamt pollcommon: use a more appropriate type than char[].
 1.10  15-Oct-2008  ad branches: 1.10.2; 1.10.4; 1.10.10; 1.10.14;
- Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of
interest to MI code. No functional change.
- Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl
shouldn't print confused output.
 1.9  04-Jun-2008  rmind branches: 1.9.4;
Check the result of allocation in the cases where size is passed by user.
 1.8  26-May-2008  ad Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.
 1.7  30-Apr-2008  ad branches: 1.7.2;
PR kern/38547 select/poll do not set l_kpriority

Among other things this could have made X11 seem sluggish.
 1.6  28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.5  24-Apr-2008  ad branches: 1.5.2;
Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
 1.4  17-Apr-2008  yamt branches: 1.4.2;
s/selwakeup/selnotify/ in a comment.
 1.3  29-Mar-2008  ad branches: 1.3.2; 1.3.4;
selwakeup: convert a while() loop into a do/while() since the first test
isn't needed.
 1.2  27-Mar-2008  ad Replace use of CACHE_LINE_SIZE in some obvious places.
 1.1  23-Mar-2008  ad branches: 1.1.2;
Split select/poll into their own file.
 1.1.2.2  24-Mar-2008  yamt sync with head.
 1.1.2.1  23-Mar-2008  yamt file sys_select.c was added on branch yamt-lazymbuf on 2008-03-24 09:39:02 +0000
 1.3.4.5  17-Jan-2009  mjf Sync with HEAD.
 1.3.4.4  05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.3.4.3  02-Jun-2008  mjf Sync with HEAD.
 1.3.4.2  03-Apr-2008  mjf Sync with HEAD.
 1.3.4.1  29-Mar-2008  mjf file sys_select.c was added on branch mjf-devfs2 on 2008-04-03 12:43:04 +0000
 1.3.2.4  27-Dec-2008  christos merge with head.
 1.3.2.3  01-Nov-2008  christos Sync with head.
 1.3.2.2  29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.3.2.1  29-Mar-2008  christos file sys_select.c was added on branch christos-time_t on 2008-03-29 20:47:01 +0000
 1.4.2.3  17-Jun-2008  yamt sync with head.
 1.4.2.2  04-Jun-2008  yamt sync with head
 1.4.2.1  18-May-2008  yamt sync with head.
 1.5.2.5  11-Aug-2010  yamt sync with head.
 1.5.2.4  11-Mar-2010  yamt sync with head
 1.5.2.3  20-Jun-2009  yamt sync with head
 1.5.2.2  04-May-2009  yamt sync with head.
 1.5.2.1  16-May-2008  yamt sync with head.
 1.7.2.3  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.7.2.2  14-May-2008  wrstuden Per discussion with ad at n dot o, revert signal mask handling
changes.

The l_sigstk changes are most likely totally un-needed as SA will
never use a signal stack - we send an upcall (or will as other
diffs are brought in).

The l_sigmask changes were too controvertial. In all honesty, I
think it's probably best to revert them. The main reason they were
there is the fact that in an SA process, we don't mask signals per
kernel thread, we mask them per user thread. In the kernel, we want
them all to get turned into upcalls. Thus the normal state of
l_sigmask in an SA process is for it to always be empty.

While we are in the process of delivering a signal, we want to
temporarily mask a signal (so we don't recursively exhaust our
upcall stacks). However signal delivery is rare (important, but
rare), and delivering back-to-back signals is even rarer. So rather
than cause every user of a signal mask to be prepared for this very
rare case, we will just add a second check later in the signal
delivery code. Said change is not in this diff.

This also un-compensates all of our compatability code for dealing
with SA. SA is a NetBSD-specific thing, so there's no need for
Irix, Linux, Solaris, SVR4 and so on to cope with it.

As previously, everything other than kern_sa.c compiles in i386
GENERIC as of this checkin. I will switch to ALL soon for compile
testing.
 1.7.2.1  10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.9.4.2  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.9.4.1  19-Oct-2008  haad Sync with HEAD.
 1.10.14.1  24-Apr-2015  msaitoh Pull up following revision(s) (requested by prlw1 in ticket #1957):

sys/kern/sys_select.c patch

Limit nfds arg to poll() to a large enough value that user programs
cannot allocate indefinite sized blocks of kvm. If the limit is
exceeded, then return EINVAL instead of silently truncating the list.
Addresses PR/17507.
[prlw1, ticket #1957]
 1.10.10.1  24-Apr-2015  msaitoh Pull up following revision(s) (requested by prlw1 in ticket #1957):

sys/kern/sys_select.c patch

Limit nfds arg to poll() to a large enough value that user programs
cannot allocate indefinite sized blocks of kvm. If the limit is
exceeded, then return EINVAL instead of silently truncating the list.
Addresses PR/17507.
[prlw1, ticket #1957]
 1.10.4.1  24-Apr-2015  msaitoh Pull up following revision(s) (requested by prlw1 in ticket #1957):

sys/kern/sys_select.c patch

Limit nfds arg to poll() to a large enough value that user programs
cannot allocate indefinite sized blocks of kvm. If the limit is
exceeded, then return EINVAL instead of silently truncating the list.
Addresses PR/17507.
 1.10.2.2  28-Apr-2009  skrll Sync with HEAD.
 1.10.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.12.2.2  23-Jul-2009  jym Sync with HEAD.
 1.12.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.21.4.4  31-May-2011  rmind sync with head
 1.21.4.3  21-Apr-2011  rmind sync with head
 1.21.4.2  05-Mar-2011  rmind sync with head
 1.21.4.1  30-May-2010  rmind sync with head
 1.21.2.3  22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.21.2.2  17-Aug-2010  uebayasi Sync with HEAD.
 1.21.2.1  30-Apr-2010  uebayasi Sync with HEAD.
 1.29.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.36.12.3  03-Dec-2017  jdolecek update from HEAD
 1.36.12.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.36.12.1  25-Feb-2013  tls resync with head
 1.36.2.1  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.37.2.1  18-May-2014  rmind sync with head
 1.38.2.1  10-Aug-2014  tls Rebase.
 1.39.4.1  28-Aug-2017  skrll Sync with HEAD
 1.40.2.1  08-Mar-2020  martin Pull up following revision(s) (requested by mlelstv in ticket #1515):

sys/kern/sys_select.c: revision 1.42-1.45

PR/54158: Anthony Mallet: poll(2) does not allow polling all possible fds
(hardcoded limit to 1000 + #<open-fds>). Changed to limit by the max of
the resource limit of open descriptors and the above.

Remove the slop code. Suggested by mrg@

Use the max limit (aka maxfiles or the moral equivalent of OPEN_MAX) which
makes poll(2) align with the Posix documentation (which allows EINVAL if
nfds > OPEN_MAX). From: Anthony Mallet

Add slop of 1000 and explain why.
 1.41.4.3  21-Apr-2020  martin Sync with HEAD
 1.41.4.2  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.41.4.1  10-Jun-2019  christos Sync with HEAD
 1.46.2.2  20-Nov-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1926):

sys/kern/sys_select.c: revision 1.67 (patch)
tests/lib/libc/sys/t_select.c: revision 1.5 (patch)

PR kern/57504 : Check all fds passed in to select

If an application passes in a huge fd_set (select(BIG, ...))
then check every bit in the fd_sets provided, to make sure
they are valid.

If BIG is too big (cannot possibly represent an open fd for
this process, under any circumstances: ie: not just because
that many are not currently open) return EINVAL.

Otherwise, check every set bit to make sure it is valid. Any
fd bits set above the applications current highest open fd
automatically generate EBADF and quick(ish) exit.
fd's that are within the plausible range are then checked as
they always were (it is possible for there to be a few there
above the max open fd - as everything in select is done in
multiples of __FDBITS (fd_mask) but the max open fd is not so
constrained. Those always were checked, continue using the
same mechanism.

This should have zero impact on any sane application which
uses the highest fd for which it set a bit, +1, as the first
arg to select. However, if there are any broken applications
that were relying upon the previous behaviour of simply ignoring
any fd_masks that started beyond the max number of open files,
then they might (if they happen to have any bits set) now fail.


tests/lib/libc/sys/t_select: Test select on bad file descriptors.

This should immediately fail, not hang, even if the bad fd is
high-numbered.

PR kern/57504: select with large enough bogus fd number set hangs
instead of failing with EBADF
 1.46.2.1  20-Nov-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1921):

sys/kern/kern_event.c: revision 1.106
sys/kern/sys_select.c: revision 1.51
sys/kern/subr_exec_fd.c: revision 1.10
sys/kern/sys_aio.c: revision 1.46
sys/kern/kern_descrip.c: revision 1.244
sys/kern/kern_descrip.c: revision 1.245
sys/ddb/db_xxx.c: revision 1.72
sys/ddb/db_xxx.c: revision 1.73
sys/miscfs/fdesc/fdesc_vnops.c: revision 1.132
sys/kern/uipc_usrreq.c: revision 1.195
sys/kern/sys_descrip.c: revision 1.36
sys/kern/uipc_usrreq.c: revision 1.196
sys/kern/uipc_socket2.c: revision 1.135
sys/kern/uipc_socket2.c: revision 1.136
sys/kern/kern_sig.c: revision 1.383
sys/kern/kern_sig.c: revision 1.384
sys/compat/netbsd32/netbsd32_ioctl.c: revision 1.107
sys/miscfs/procfs/procfs_vnops.c: revision 1.208
sys/kern/subr_exec_fd.c: revision 1.9
sys/kern/kern_descrip.c: revision 1.252
(all via patch)

Load struct filedesc::fd_dt with atomic_load_consume.

Exceptions: when fd_refcnt <= 1, or when holding fd_lock.

While here:
- Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused.
=> This is used only in fd_close and fd_abort, where it holds.
- Move bounds check assertion in fd_putfile to where it matters.
- Store fd_dt with atomic_store_release.
- Move load of fd_dt under lock in knote_fdclose.
- Omit membar_consumer in fdesc_readdir.
=> atomic_load_consume serves the same purpose now.
=> Was needed only on alpha anyway.

Load struct fdfile::ff_file with atomic_load_consume.
Exceptions: when we're only testing whether it's there, not about to
dereference it.

Note: We do not use atomic_store_release to set it because the
preceding mutex_exit should be enough.

(That said, it's not clear the mutex_enter/exit is needed unless
refcnt > 0 already, in which case maybe it would be a win to switch
from the membar implied by mutex_enter to the membar implied by
atomic_store_release -- which I would generally expect to be much
cheaper. And a little clearer without a long comment.)
kern_descrip.c: Fix membars around reference count decrement.

In general, the `last one out hit the lights' style of reference
counting (as opposed to the `whoever's destroying must wait for
pending users to finish' style) requires memory barriers like so:

... usage of resources associated with object ...
membar_release();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_acquire();
... freeing of resources associated with object ...

This way, all usage happens-before all freeing. This fixes several
errors:
- fd_close failed to ensure whatever its caller did would
happen-before the freeing, in the case where another thread is
concurrently trying to close the fd (ff->ff_file == NULL).
Fix: Add membar_release before atomic_dec_uint(&ff->ff_refcnt) in
that branch.
- fd_close failed to ensure all loads its caller had issued will have
happened-before the freeing, in the case where the fd is still in
use by another thread (fdp->fd_refcnt > 1 and ff->ff_refcnt-- > 0).
Fix: Change membar_producer to membar_release before
atomic_dec_uint(&ff->ff_refcnt).
- fd_close failed to ensure that any usage of fp by other callers
would happen-before any freeing it does.
Fix: Add membar_acquire after atomic_dec_uint_nv(&ff->ff_refcnt).
- fd_free failed to ensure that any usage of fdp by other callers
would happen-before any freeing it does.
Fix: Add membar_acquire after atomic_dec_uint_nv(&fdp->fd_refcnt).

While here, change membar_exit -> membar_release. No semantic
change, just updating away from the legacy API.
 1.50.2.1  29-Feb-2020  ad Sync with head.
 1.53.2.1  20-Apr-2020  bouyer Sync with HEAD
 1.54.2.1  14-Dec-2020  thorpej Sync w/ HEAD.
 1.60.4.1  18-Nov-2024  martin Pull up following revision(s) (requested by riastradh in ticket #1011):

sys/kern/sys_select.c: revision 1.67
tests/lib/libc/sys/t_select.c: revision 1.5

PR kern/57504 : Check all fds passed in to select

If an application passes in a huge fd_set (select(BIG, ...))
then check every bit in the fd_sets provided, to make sure
they are valid.

If BIG is too big (cannot possibly represent an open fd for
this process, under any circumstances: ie: not just because
that many are not currently open) return EINVAL.
Otherwise, check every set bit to make sure it is valid. Any
fd bits set above the applications current highest open fd
automatically generate EBADF and quick(ish) exit.
fd's that are within the plausible range are then checked as
they always were (it is possible for there to be a few there
above the max open fd - as everything in select is done in
multiples of __FDBITS (fd_mask) but the max open fd is not so
constrained. Those always were checked, continue using the
same mechanism.

This should have zero impact on any sane application which
uses the highest fd for which it set a bit, +1, as the first
arg to select. However, if there are any broken applications
that were relying upon the previous behaviour of simply ignoring
any fd_masks that started beyond the max number of open files,
then they might (if they happen to have any bits set) now fail.

tests/lib/libc/sys/t_select: Test select on bad file descriptors.
This should immediately fail, not hang, even if the bad fd is
high-numbered.

PR kern/57504: select with large enough bogus fd number set hangs
instead of failing with EBADF
 1.66.6.1  02-Aug-2025  perseant Sync with HEAD

RSS XML Feed