Cross Reference: /src/sys/kern/sys

History log of /src/sys/kern/sys_select.c
Revision	Date	Author	Comments
1.68	26-Nov-2024	khorben	Typo in a comment
1.67	18-Oct-2024	kre	PR kern/57504 : Check all fds passed in to select If an application passes in a huge fd_set (select(BIG, ...)) then check every bit in the fd_sets provided, to make sure they are valid. If BIG is too big (cannot possibly represent an open fd for this process, under any circumstances: ie: not just because that many are not currently open) return EINVAL. Otherwise, check every set bit to make sure it is valid. Any fd bits set above the applications current highest open fd automatically generate EBADF and quick(ish) exit. fd's that are within the plausible range are then checked as they always were (it is possible for there to be a few there above the max open fd - as everything in select is done in multiples of __FDBITS (fd_mask) but the max open fd is not so constrained. Those always were checked, continue using the same mechanism. This should have zero impact on any sane application which uses the highest fd for which it set a bit, +1, as the first arg to select. However, if there are any broken applications that were relying upon the previous behaviour of simply ignoring any fd_masks that started beyond the max number of open files, then they might (if they happen to have any bits set) now fail. XXX pullup -10 -- but not for a long time. Someone remind me sometime next year. Leave a long settling time in HEAD just to be sure no issues arise, as in practice, almost nothing should cause any of the new code to be executed. pullup -9 -- probably not, what this fixes isn't significant enough to bother going that far back for (IMO).
1.66	15-Oct-2023	riastradh	branches: 1.66.6; sys_select.c: Sort includes. No functional change intended.
1.65	15-Oct-2023	riastradh	sys/lwp.h: Nix sys/syncobj.h dependency. Remove it in ddb/db_syncobj.h too. New sys/wchan.h defines wchan_t so that users need not pull in sys/syncobj.h to get it. Sprinkle #include <sys/syncobj.h> in .c files where it is now needed.
1.64	08-Oct-2023	ad	Ensure that an LWP that has taken a legitimate wakeup never produces an error code from sleepq_block(). Then, it's possible to make cv_signal() work as expected and only ever wake a singular LWP.
1.63	04-Oct-2023	ad	Eliminate l->l_biglocks. Originally I think it had a use but these days a local variable will do.
1.62	23-Sep-2023	ad	- Simplify how priority boost for blocking in kernel is handled. Rather than setting it up at each site where we block, make it a property of syncobj_t. Then, do not hang onto the priority boost until userret(), drop it as soon as the LWP is out of the run queue and onto a CPU. Holding onto it longer is of questionable benefit. - This allows two members of lwp_t to be deleted, and mi_userret() to be simplified a lot (next step: trim it down to a single conditional). - While here, constify syncobj_t and de-inline a bunch of small functions like lwp_lock() which turn out not to be small after all (I don't know why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and beyond what volatile does).
1.61	17-Jul-2023	riastradh	kern: New struct syncobj::sobj_name member for diagnostics. XXX potential kernel ABI change -- not sure any modules actually use struct syncobj but it's hard to rule that out because sys/syncobj.h leaks into sys/lwp.h
1.60	29-Jun-2022	riastradh	branches: 1.60.4; sleepq(9): Pass syncobj through to sleepq_block. Previously the usage pattern was: sleepq_enter(sq, l, lock); // locks l ... sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj ... () sleepq_block(...); // unlocks l As long as l remains locked from sleepq_enter to sleepq_block, l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine whether the sleep is on a mutex in order to avoid creating ktrace context-switch records (which involves allocation which is forbidden in softint context, while taking and even sleeping for a mutex is allowed). However, in turnstile_block, the logic at () also involves turnstile_lendpri, which sometimes unlocks and relocks l. At that point, another thread can swoop in and sleepq_remove l, which sets l_syncobj to sched_syncobj. If that happens, ktrcsw does what is forbidden -- tries to allocate a ktrace record for the context switch. As an optimization, sleepq_block or turnstile_block could stop early if it detects that l_syncobj doesn't match -- we've already been requested to wake up at this point so there's no need to mi_switch. (And then it would be unnecessary to pass the syncobj through sleepq_block, because l_syncobj would remain stable.) But I'll leave that to another change. Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
1.59	09-Apr-2022	riastradh	select(9): Use membar_acquire/release and atomic_store_release. No store-before-load ordering here -- this was obviously always intended to be load-before-load/store all along.
1.58	12-Feb-2022	thorpej	Add inline functions to manipulate the klists that link up knotes via kn_selnext: - klist_init() - klist_fini() - klist_insert() - klist_remove() These provide some API insulation from the implementation details of these lists (but not completely; see vn_knote_attach() and vn_knote_detach()). Currently just a wrapper around SLIST(9). This will make it significantly easier to switch kn_selnext linkage to a different kind of list.
1.57	10-Dec-2021	andvar	s/occured/occurred/ in comments, log messages and man pages.
1.56	29-Sep-2021	thorpej	- Change selremove_knote() from returning void to bool, and return true if the last knote was removed and there are no more knotes on the selinfo. - Use this new return value in filt_sordetach(), filt_sowdetach(), filt_fifordetach(), and filt_fifowdetach() to know when to clear SB_KOTE without having to know select/kqueue implementation details.
1.55	11-Dec-2020	thorpej	Add sel{record,remove}_knote(), so hide some of the details surrounding knote / kevent registration in the selinfo structure.
1.54	19-Apr-2020	ad	branches: 1.54.2; Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable waits with turnstiles (not currently done).
1.53	26-Mar-2020	ad	branches: 1.53.2; Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only select/poll used the FIFO method and that was for collisions which rarely occur. Shrinks sleep_t and condvar_t.
1.52	15-Feb-2020	ad	- List all of the syncobjs in syncobj.h. - Update a comment.
1.51	01-Feb-2020	riastradh	Load struct filedesc::fd_dt with atomic_load_consume. Exceptions: when fd_refcnt <= 1, or when holding fd_lock. While here: - Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused. => This is used only in fd_close and fd_abort, where it holds. - Move bounds check assertion in fd_putfile to where it matters. - Store fd_dt with atomic_store_release. - Move load of fd_dt under lock in knote_fdclose. - Omit membar_consumer in fdesc_readdir. => atomic_load_consume serves the same purpose now. => Was needed only on alpha anyway.
1.50	22-Nov-2019	ad	branches: 1.50.2; Minor correction to previous.
1.49	21-Nov-2019	ad	Minor improvements to select/poll: - Increase the maximum number of clusters from 32 to 64 for large systems. kcpuset_t could potentially be used here but that's an excursion I don't want to go on right now. uint32_t -> uint64_t is very simple. - In the case of a non-blocking select/poll, or where we won't block because there are events ready to report, stop registering interest in the back-end objects early. - Change the wmesg for poll back to "poll".
1.48	20-Sep-2019	kamil	Validate usec ranges in sys___select50() Later in the code selcommon() checks for proper timespec, check only correct usec of timeval before type conversions.
1.47	20-Aug-2019	msaitoh	Use unsigned to avoid undefined behavior. Found by kUBSan.
1.46	26-Jul-2019	msaitoh	branches: 1.46.2; Set sc_mask correctly in selsysinit() to avoid undefined behavior. Found by KUBSan.
1.45	08-May-2019	christos	Add slop of 1000 and explain why.
1.44	07-May-2019	christos	Use the max limit (aka maxfiles or the moral equivalent of OPEN_MAX) which makes poll(2) align with the Posix documentation (which allows EINVAL if nfds > OPEN_MAX). From: Anthony Mallet
1.43	05-May-2019	christos	Remove the slop code. Suggested by mrg@
1.42	04-May-2019	christos	PR/54158: Anthony Mallet: poll(2) does not allow polling all possible fds (hardcoded limit to 1000 + #<open-fds>). Changed to limit by the max of the resource limit of open descriptors and the above.
1.41	30-Jan-2018	ozaki-r	branches: 1.41.4; Apply C99-style struct initialization to syncobj_t
1.40	01-Jun-2017	chs	branches: 1.40.2; remove checks for failure after memory allocation calls that cannot fail: kmem_alloc() with KM_SLEEP kmem_zalloc() with KM_SLEEP percpu_alloc() pserialize_create() psref_class_create() all of these paths include an assertion that the allocation has not failed, so callers should not assert that again.
1.39	25-Apr-2014	pooka	branches: 1.39.4; Remove pollsock(). Since it took only a single socket, it was essentially a complicated way to call soreceive() with a sb_timeo. The only user (netsmb) already did that anyway, so just had to delete the call to pollsock().
1.38	25-Feb-2014	pooka	branches: 1.38.2; Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before the sysctl link sets are processed, and remove redundancy. Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate lines of code.
1.37	26-Jan-2013	riastradh	branches: 1.37.2; Assert equality, not assignment, in selrecord. Code inspection suggests that this fix is not likely to reveal any latent problems.
1.36	29-Aug-2011	rmind	branches: 1.36.2; 1.36.12; Add kern.direct_select sysctl. Default to 0 for now.
1.35	09-Aug-2011	hannken	No need to lock the selcluster in selscan() if either NO_DIRECT_SELECT is defined or all polls return an event.
1.34	06-Aug-2011	hannken	Fix the races of direct select()/poll(): - When sel_do_scan() restarts do a full initialization with selclear() so we start from an empty set without registered events. Defer the evaluation of l_selret after selclear() and add the count of direct events to the count of events. - For selscan()/pollscan() zero the output descriptors before we poll and for selscan() take the sc_lock before we change them. - Change sel_setevents() to not count events already set. Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org> Should fix PR #44763 (select/poll direct-set optimization seems racy) and PR #45187 (select(2) sometimes doesn't wakeup)
1.33	28-May-2011	christos	If a signal did not fire, restore the original signal mask for pselect/pollts using a signal mask. Tested by tron.
1.32	18-May-2011	christos	No need to mask twice. The setup function does it.
1.31	18-May-2011	christos	PR/43625: Mark Davies: Fix pselect(2) to honor the temporary mask. pselect(2) (and pollts(2)) are similar to sigsuspend(2) in that they temporarily change the process signal mask and wait for signal delivery. Factor out and share the code that does this.
1.30	06-Mar-2011	rmind	In a case of direct select, set only masked events, do not wakeup LWP if no polled/selected events were set; also, count the correct return value for the select.
1.29	18-Dec-2010	rmind	branches: 1.29.2; - Fix a few possible locking issues in execve1() and exit1(). Add a note that scheduler locks are special in this regard - adaptive locks cannot be in the path due to turnstiles. Randomly spotted/reported by uebayasi@. - Remove unused lwp_relock() and replace lwp_lock_retry() by simplifying lwp_lock() and sleepq_enter() a little. - Give alllwp its own cache-line and mark lwp_cache pointer as read-mostly. OK ad@
1.28	15-Oct-2010	rmind	Re-enable direct select.
1.27	12-Jul-2010	rmind	sel_setevents: fix error - match event-set, as intended. Spotted by Enami Tsugutomo.
1.26	11-Jul-2010	rmind	Disable direct select for now, since it still brings problems.
1.25	10-Jul-2010	rmind	sel_setevents: fix direct injecting of fd bit for select() case.
1.24	08-Jul-2010	rmind	sel_do_scan: do not bother to assert for SEL_SCANNING state before blocking, as it might also be SEL_BLOCKING due to spurious wake-ups. That has no harm.
1.23	08-Jul-2010	rmind	Implement direct select/poll support, currently effective for socket and pipe subsystems. Avoids overhead of second selscan() on wake-up, and thus improves performance on certain workloads (especially when polling on many file-descriptors). Also, clean-up sys/fd_set.h header and improve macros. Welcome to 5.99.36!
1.22	25-Apr-2010	ad	Make select/poll work with more than 32 CPUs. No ABI change.
1.21	20-Dec-2009	rmind	branches: 1.21.2; 1.21.4; Add comment about locking.
1.20	12-Dec-2009	dsl	Bounding the 'nfds' arg to poll() at the current process limit for actual open files is rather gross - the poll map isn't required to be dense. Instead limit to a much larger value (1000 + dt_nfiles) so that user programs cannot allocate indefinite sized blocks of kvm. If the limit is exceeded, then return EINVAL instead of silently truncating the list. (The silent truncation in select isn't quite as bad - although even there any high bits that are set ought to generate an EBADF response.) Move the code that converts ERESTART and EWOULDBLOCK into common code. Effectively fixes PR/17507 since the new limit is unlikely to be detected.
1.19	11-Nov-2009	rmind	- selcommon/pollcommon: drop redundant l argument. - Use cached curlwp->l_fd, instead of p->p_fd. - Inline selscan/pollscan.
1.18	01-Nov-2009	rmind	- Move inittimeleft() and gettimeleft() to subr_time.c, where they belong. - Move abstimeout2timo() there too and export. Use it in lwp_park().
1.17	01-Nov-2009	rmind	Move common logic in selcommon() and pollcommon() into sel_do_scan(). Avoids code duplication. XXX: pollsock() should be converted too, except it's a bit ugly.
1.16	21-Oct-2009	rmind	Remove uarea swap-out functionality: - Addresses the issue described in PR/38828. - Some simplification in threading and sleepq subsystems. - Eliminates pmap_collect() and, as a side note, allows pmap optimisations. - Eliminates XS_CTL_DATA_ONSTACK in scsipi code. - Avoids few scans on LWP list and thus potentially long holds of proc_lock. - Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k. - Removes __SWAP_BROKEN cases. Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on acorn26 (thanks to <bjh21>). Discussed on <tech-kern>, reviewed by <ad>.
1.15	24-May-2009	ad	More changes to improve kern_descrip.c. - Avoid atomics in more places. - Remove the per-descriptor mutex, and just use filedesc_t::fd_lock. It was only being used to synchronize close, and in any case we needed to take fd_lock to free the descriptor slot. - Optimize certain paths for the <NDFDFILE case. - Sprinkle more comments and assertions. - Cache more stuff in filedesc_t. - Fix numerous minor bugs spotted along the way. - Restructure how the open files array is maintained, for clarity and so that we can eliminate the membar_consumer() call in fd_getfile(). This is mostly syntactic sugar; the main functional change is that fd_nfiles now lives alongside the open file array. Some measurements with libmicro: - simple file syscalls are like close() are between 1 to 10% faster. - some nice improvements, e.g. poll(1000) which is ~50% faster.
1.14	29-Mar-2009	christos	Move the internal poll/select related API's to use timespec instead of timeval (rides the uvm bump).
1.13	21-Mar-2009	ad	Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage on !MP kernels, and reduces false sharing on MP ones.
1.12	11-Jan-2009	christos	branches: 1.12.2; merge christos-time_t
1.11	20-Nov-2008	yamt	pollcommon: use a more appropriate type than char[].
1.10	15-Oct-2008	ad	branches: 1.10.2; 1.10.4; 1.10.10; 1.10.14; - Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of interest to MI code. No functional change. - Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl shouldn't print confused output.
1.9	04-Jun-2008	rmind	branches: 1.9.4; Check the result of allocation in the cases where size is passed by user.
1.8	26-May-2008	ad	Take the mutex pointer and waiters count out of sleepq_t: the values can be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.
1.7	30-Apr-2008	ad	branches: 1.7.2; PR kern/38547 select/poll do not set l_kpriority Among other things this could have made X11 seem sluggish.
1.6	28-Apr-2008	martin	Remove clause 3 and 4 from TNF licenses
1.5	24-Apr-2008	ad	branches: 1.5.2; Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since we no longer need to guard against access from hardware interrupt handlers. Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the child process share the parent's lock so that signal state may be kept in sync. Partially addresses PR kern/37437.
1.4	17-Apr-2008	yamt	branches: 1.4.2; s/selwakeup/selnotify/ in a comment.
1.3	29-Mar-2008	ad	branches: 1.3.2; 1.3.4; selwakeup: convert a while() loop into a do/while() since the first test isn't needed.
1.2	27-Mar-2008	ad	Replace use of CACHE_LINE_SIZE in some obvious places.
1.1	23-Mar-2008	ad	branches: 1.1.2; Split select/poll into their own file.
1.1.2.2	24-Mar-2008	yamt	sync with head.
1.1.2.1	23-Mar-2008	yamt	file sys_select.c was added on branch yamt-lazymbuf on 2008-03-24 09:39:02 +0000
1.3.4.5	17-Jan-2009	mjf	Sync with HEAD.
1.3.4.4	05-Jun-2008	mjf	Sync with HEAD. Also fix build.
1.3.4.3	02-Jun-2008	mjf	Sync with HEAD.
1.3.4.2	03-Apr-2008	mjf	Sync with HEAD.
1.3.4.1	29-Mar-2008	mjf	file sys_select.c was added on branch mjf-devfs2 on 2008-04-03 12:43:04 +0000
1.3.2.4	27-Dec-2008	christos	merge with head.
1.3.2.3	01-Nov-2008	christos	Sync with head.
1.3.2.2	29-Mar-2008	christos	Welcome to the time_t=long long dev_t=uint64_t branch.
1.3.2.1	29-Mar-2008	christos	file sys_select.c was added on branch christos-time_t on 2008-03-29 20:47:01 +0000
1.4.2.3	17-Jun-2008	yamt	sync with head.
1.4.2.2	04-Jun-2008	yamt	sync with head
1.4.2.1	18-May-2008	yamt	sync with head.
1.5.2.5	11-Aug-2010	yamt	sync with head.
1.5.2.4	11-Mar-2010	yamt	sync with head
1.5.2.3	20-Jun-2009	yamt	sync with head
1.5.2.2	04-May-2009	yamt	sync with head.
1.5.2.1	16-May-2008	yamt	sync with head.
1.7.2.3	23-Jun-2008	wrstuden	Sync w/ -current. 34 merge conflicts to follow.
1.7.2.2	14-May-2008	wrstuden	Per discussion with ad at n dot o, revert signal mask handling changes. The l_sigstk changes are most likely totally un-needed as SA will never use a signal stack - we send an upcall (or will as other diffs are brought in). The l_sigmask changes were too controvertial. In all honesty, I think it's probably best to revert them. The main reason they were there is the fact that in an SA process, we don't mask signals per kernel thread, we mask them per user thread. In the kernel, we want them all to get turned into upcalls. Thus the normal state of l_sigmask in an SA process is for it to always be empty. While we are in the process of delivering a signal, we want to temporarily mask a signal (so we don't recursively exhaust our upcall stacks). However signal delivery is rare (important, but rare), and delivering back-to-back signals is even rarer. So rather than cause every user of a signal mask to be prepared for this very rare case, we will just add a second check later in the signal delivery code. Said change is not in this diff. This also un-compensates all of our compatability code for dealing with SA. SA is a NetBSD-specific thing, so there's no need for Irix, Linux, Solaris, SVR4 and so on to cope with it. As previously, everything other than kern_sa.c compiles in i386 GENERIC as of this checkin. I will switch to ALL soon for compile testing.
1.7.2.1	10-May-2008	wrstuden	Initial checkin of re-adding SA. Everything except kern_sa.c compiles in GENERIC for i386. This is still a work-in-progress, but this checkin covers most of the mechanical work (changing signalling to be able to accomidate SA's process-wide signalling and re-adding includes of sys/sa.h and savar.h). Subsequent changes will be much more interesting. Also, kern_sa.c has received partial cleanup. There's still more to do, though.
1.9.4.2	13-Dec-2008	haad	Update haad-dm branch to haad-dm-base2.
1.9.4.1	19-Oct-2008	haad	Sync with HEAD.
1.10.14.1	24-Apr-2015	msaitoh	Pull up following revision(s) (requested by prlw1 in ticket #1957): sys/kern/sys_select.c patch Limit nfds arg to poll() to a large enough value that user programs cannot allocate indefinite sized blocks of kvm. If the limit is exceeded, then return EINVAL instead of silently truncating the list. Addresses PR/17507. [prlw1, ticket #1957]
1.10.10.1	24-Apr-2015	msaitoh	Pull up following revision(s) (requested by prlw1 in ticket #1957): sys/kern/sys_select.c patch Limit nfds arg to poll() to a large enough value that user programs cannot allocate indefinite sized blocks of kvm. If the limit is exceeded, then return EINVAL instead of silently truncating the list. Addresses PR/17507. [prlw1, ticket #1957]
1.10.4.1	24-Apr-2015	msaitoh	Pull up following revision(s) (requested by prlw1 in ticket #1957): sys/kern/sys_select.c patch Limit nfds arg to poll() to a large enough value that user programs cannot allocate indefinite sized blocks of kvm. If the limit is exceeded, then return EINVAL instead of silently truncating the list. Addresses PR/17507.
1.10.2.2	28-Apr-2009	skrll	Sync with HEAD.
1.10.2.1	19-Jan-2009	skrll	Sync with HEAD.
1.12.2.2	23-Jul-2009	jym	Sync with HEAD.
1.12.2.1	13-May-2009	jym	Sync with HEAD. Commit is split, to avoid a "too many arguments" protocol error.
1.21.4.4	31-May-2011	rmind	sync with head
1.21.4.3	21-Apr-2011	rmind	sync with head
1.21.4.2	05-Mar-2011	rmind	sync with head
1.21.4.1	30-May-2010	rmind	sync with head
1.21.2.3	22-Oct-2010	uebayasi	Sync with HEAD (-D20101022).
1.21.2.2	17-Aug-2010	uebayasi	Sync with HEAD.
1.21.2.1	30-Apr-2010	uebayasi	Sync with HEAD.
1.29.2.1	06-Jun-2011	jruoho	Sync with HEAD.
1.36.12.3	03-Dec-2017	jdolecek	update from HEAD
1.36.12.2	20-Aug-2014	tls	Rebase to HEAD as of a few days ago.
1.36.12.1	25-Feb-2013	tls	resync with head
1.36.2.1	22-May-2014	yamt	sync with head. for a reference, the tree before this commit was tagged as yamt-pagecache-tag8. this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
1.37.2.1	18-May-2014	rmind	sync with head
1.38.2.1	10-Aug-2014	tls	Rebase.
1.39.4.1	28-Aug-2017	skrll	Sync with HEAD
1.40.2.1	08-Mar-2020	martin	Pull up following revision(s) (requested by mlelstv in ticket #1515): sys/kern/sys_select.c: revision 1.42-1.45 PR/54158: Anthony Mallet: poll(2) does not allow polling all possible fds (hardcoded limit to 1000 + #<open-fds>). Changed to limit by the max of the resource limit of open descriptors and the above. Remove the slop code. Suggested by mrg@ Use the max limit (aka maxfiles or the moral equivalent of OPEN_MAX) which makes poll(2) align with the Posix documentation (which allows EINVAL if nfds > OPEN_MAX). From: Anthony Mallet Add slop of 1000 and explain why.
1.41.4.3	21-Apr-2020	martin	Sync with HEAD
1.41.4.2	13-Apr-2020	martin	Mostly merge changes from HEAD upto 20200411
1.41.4.1	10-Jun-2019	christos	Sync with HEAD
1.46.2.2	20-Nov-2024	martin	Pull up following revision(s) (requested by riastradh in ticket #1926): sys/kern/sys_select.c: revision 1.67 (patch) tests/lib/libc/sys/t_select.c: revision 1.5 (patch) PR kern/57504 : Check all fds passed in to select If an application passes in a huge fd_set (select(BIG, ...)) then check every bit in the fd_sets provided, to make sure they are valid. If BIG is too big (cannot possibly represent an open fd for this process, under any circumstances: ie: not just because that many are not currently open) return EINVAL. Otherwise, check every set bit to make sure it is valid. Any fd bits set above the applications current highest open fd automatically generate EBADF and quick(ish) exit. fd's that are within the plausible range are then checked as they always were (it is possible for there to be a few there above the max open fd - as everything in select is done in multiples of __FDBITS (fd_mask) but the max open fd is not so constrained. Those always were checked, continue using the same mechanism. This should have zero impact on any sane application which uses the highest fd for which it set a bit, +1, as the first arg to select. However, if there are any broken applications that were relying upon the previous behaviour of simply ignoring any fd_masks that started beyond the max number of open files, then they might (if they happen to have any bits set) now fail. tests/lib/libc/sys/t_select: Test select on bad file descriptors. This should immediately fail, not hang, even if the bad fd is high-numbered. PR kern/57504: select with large enough bogus fd number set hangs instead of failing with EBADF
1.46.2.1	20-Nov-2024	martin	Pull up following revision(s) (requested by riastradh in ticket #1921): sys/kern/kern_event.c: revision 1.106 sys/kern/sys_select.c: revision 1.51 sys/kern/subr_exec_fd.c: revision 1.10 sys/kern/sys_aio.c: revision 1.46 sys/kern/kern_descrip.c: revision 1.244 sys/kern/kern_descrip.c: revision 1.245 sys/ddb/db_xxx.c: revision 1.72 sys/ddb/db_xxx.c: revision 1.73 sys/miscfs/fdesc/fdesc_vnops.c: revision 1.132 sys/kern/uipc_usrreq.c: revision 1.195 sys/kern/sys_descrip.c: revision 1.36 sys/kern/uipc_usrreq.c: revision 1.196 sys/kern/uipc_socket2.c: revision 1.135 sys/kern/uipc_socket2.c: revision 1.136 sys/kern/kern_sig.c: revision 1.383 sys/kern/kern_sig.c: revision 1.384 sys/compat/netbsd32/netbsd32_ioctl.c: revision 1.107 sys/miscfs/procfs/procfs_vnops.c: revision 1.208 sys/kern/subr_exec_fd.c: revision 1.9 sys/kern/kern_descrip.c: revision 1.252 (all via patch) Load struct filedesc::fd_dt with atomic_load_consume. Exceptions: when fd_refcnt <= 1, or when holding fd_lock. While here: - Restore KASSERT(mutex_owned(&fdp->fd_lock)) in fd_unused. => This is used only in fd_close and fd_abort, where it holds. - Move bounds check assertion in fd_putfile to where it matters. - Store fd_dt with atomic_store_release. - Move load of fd_dt under lock in knote_fdclose. - Omit membar_consumer in fdesc_readdir. => atomic_load_consume serves the same purpose now. => Was needed only on alpha anyway. Load struct fdfile::ff_file with atomic_load_consume. Exceptions: when we're only testing whether it's there, not about to dereference it. Note: We do not use atomic_store_release to set it because the preceding mutex_exit should be enough. (That said, it's not clear the mutex_enter/exit is needed unless refcnt > 0 already, in which case maybe it would be a win to switch from the membar implied by mutex_enter to the membar implied by atomic_store_release -- which I would generally expect to be much cheaper. And a little clearer without a long comment.) kern_descrip.c: Fix membars around reference count decrement. In general, the `last one out hit the lights' style of reference counting (as opposed to the `whoever's destroying must wait for pending users to finish' style) requires memory barriers like so: ... usage of resources associated with object ... membar_release(); if (atomic_dec_uint_nv(&obj->refcnt) != 0) return; membar_acquire(); ... freeing of resources associated with object ... This way, all usage happens-before all freeing. This fixes several errors: - fd_close failed to ensure whatever its caller did would happen-before the freeing, in the case where another thread is concurrently trying to close the fd (ff->ff_file == NULL). Fix: Add membar_release before atomic_dec_uint(&ff->ff_refcnt) in that branch. - fd_close failed to ensure all loads its caller had issued will have happened-before the freeing, in the case where the fd is still in use by another thread (fdp->fd_refcnt > 1 and ff->ff_refcnt-- > 0). Fix: Change membar_producer to membar_release before atomic_dec_uint(&ff->ff_refcnt). - fd_close failed to ensure that any usage of fp by other callers would happen-before any freeing it does. Fix: Add membar_acquire after atomic_dec_uint_nv(&ff->ff_refcnt). - fd_free failed to ensure that any usage of fdp by other callers would happen-before any freeing it does. Fix: Add membar_acquire after atomic_dec_uint_nv(&fdp->fd_refcnt). While here, change membar_exit -> membar_release. No semantic change, just updating away from the legacy API.
1.50.2.1	29-Feb-2020	ad	Sync with head.
1.53.2.1	20-Apr-2020	bouyer	Sync with HEAD
1.54.2.1	14-Dec-2020	thorpej	Sync w/ HEAD.
1.60.4.1	18-Nov-2024	martin	Pull up following revision(s) (requested by riastradh in ticket #1011): sys/kern/sys_select.c: revision 1.67 tests/lib/libc/sys/t_select.c: revision 1.5 PR kern/57504 : Check all fds passed in to select If an application passes in a huge fd_set (select(BIG, ...)) then check every bit in the fd_sets provided, to make sure they are valid. If BIG is too big (cannot possibly represent an open fd for this process, under any circumstances: ie: not just because that many are not currently open) return EINVAL. Otherwise, check every set bit to make sure it is valid. Any fd bits set above the applications current highest open fd automatically generate EBADF and quick(ish) exit. fd's that are within the plausible range are then checked as they always were (it is possible for there to be a few there above the max open fd - as everything in select is done in multiples of __FDBITS (fd_mask) but the max open fd is not so constrained. Those always were checked, continue using the same mechanism. This should have zero impact on any sane application which uses the highest fd for which it set a bit, +1, as the first arg to select. However, if there are any broken applications that were relying upon the previous behaviour of simply ignoring any fd_masks that started beyond the max number of open files, then they might (if they happen to have any bits set) now fail. tests/lib/libc/sys/t_select: Test select on bad file descriptors. This should immediately fail, not hang, even if the bad fd is high-numbered. PR kern/57504: select with large enough bogus fd number set hangs instead of failing with EBADF
1.66.6.1	02-Aug-2025	perseant	Sync with HEAD

OpenGrok