Home | History | Annotate | Download | only in sys
History log of /src/sys/sys/sleepq.h
RevisionDateAuthorComments
 1.42  15-Oct-2023  riastradh sys/sleepq.h: Sort includes. No functional change intended.
 1.41  15-Oct-2023  riastradh sys/lwp.h: Nix sys/syncobj.h dependency.

Remove it in ddb/db_syncobj.h too.

New sys/wchan.h defines wchan_t so that users need not pull in
sys/syncobj.h to get it.

Sprinkle #include <sys/syncobj.h> in .c files where it is now needed.
 1.40  08-Oct-2023  ad Ensure that an LWP that has taken a legitimate wakeup never produces an
error code from sleepq_block(). Then, it's possible to make cv_signal()
work as expected and only ever wake a singular LWP.
 1.39  04-Oct-2023  ad Eliminate l->l_biglocks. Originally I think it had a use but these days a
local variable will do.
 1.38  25-Sep-2023  riastradh sys/sleepq.h: Fix more syncobj_t creep.
 1.37  23-Sep-2023  ad - Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).
 1.36  26-Oct-2022  riastradh sys/sleepq.h: Get cold from sys/kernel.h.
 1.35  29-Jun-2022  riastradh sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
 1.34  01-Nov-2020  christos turned sleepq_destroy this into a macro
 1.33  01-Nov-2020  christos PR/55664: Ruslan Nikolaev: Split out sleepq guts and turnstiles not used
in rump into a separate header file. Add a sleepq_destroy() empty hook.
 1.32  23-Oct-2020  thorpej branches: 1.32.2;
- sleepq_block(): Add a new LWP flag, LW_CATCHINTR, that is used to track
the intent to catch signals while sleeping. Initialize this flag based
on the catch_p argument to sleepq_block(), and rather than test catch_p
when awakened, test LW_CATCHINTR. This allows the intent to change
(based on whatever criteria the owner of the sleepq wishes) while the
LWP is asleep. This is separate from LW_SINTR in order to leave all
other logic around LW_SINTR unaffected.
- In sleepq_transfer(), adjust also LW_CATCHINTR based on the catch_p
argument. Also allow the new LWP lock argument to be NULL, which
will cause the lwp_setlock() call to be skipped; this allows transfer
to another sleepq that is known to be protected by the same lock.
- Add a new function, sleepq_uncatch(), that will transition an LWP
from "interruptible sleep" to "uninterruptible sleep" on its current
sleepq.
 1.31  23-May-2020  ad - Replace pid_table_lock with a lockless lookup covered by pserialize, with
the "writer" side being pid_table expansion. The basic idea is that when
doing an LWP lookup there is usually already a lock held (p->p_lock), or a
spin mutex that needs to be taken (l->l_mutex), and either can be used to
get the found LWP stable and confidently determine that all is correct.

- For user processes LSLARVAL implies the same thing as LSIDL ("not visible
by ID"), and lookup by ID in proc0 doesn't really happen. In-tree the new
state should be understood by top(1), the tty subsystem and so on, and
would attract the attention of 3rd party kernel grovellers in time, so
remove it and just rely on LSIDL.
 1.30  08-May-2020  thorpej Add a new function, sleepq_transfer(), that moves an lwp from one
sleepq to another.
 1.29  19-Apr-2020  ad Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).
 1.28  26-Mar-2020  ad branches: 1.28.2;
Change sleepq_t from a TAILQ to a LIST and remove SOBJ_SLEEPQ_FIFO. Only
select/poll used the FIFO method and that was for collisions which rarely
occur. Shrinks sleep_t and condvar_t.
 1.27  16-Dec-2019  ad As with turnstiles, don't bother allocating sleepq locks with mutex_obj_alloc(),
and avoid the indirect reference.
 1.26  21-Nov-2019  ad Sleep queues & turnstiles:

- Avoid false sharing.
- Make the turnstile hash function more suitable.
- Increase turnstile hash table size.
- Make amends by having only one set of system wide sleep queue hash locks.
 1.25  19-Apr-2018  christos branches: 1.25.2;
s/static inline/static __inline/g for consistency with other include
headers.
 1.24  08-Feb-2015  christos branches: 1.24.16;
make this kmemuser friendly.
 1.23  24-Apr-2014  pooka branches: 1.23.4;
Make sleepq_wake() type void. The return value hasn't been used in
almost 6 years. Even if it were, returning an arbitrary lwp is a bit
of a wonky interface and can really work only when expected == 1.
 1.22  19-Feb-2012  rmind branches: 1.22.2; 1.22.4; 1.22.12;
Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.
 1.21  21-Nov-2011  christos branches: 1.21.2;
change printf gcc attribute to __printflike(), requested by joerg.
 1.20  20-Nov-2011  christos add more missing printf attributes.
 1.19  18-Dec-2010  rmind branches: 1.19.8;
- Fix a few possible locking issues in execve1() and exit1(). Add a note
that scheduler locks are special in this regard - adaptive locks cannot
be in the path due to turnstiles. Randomly spotted/reported by uebayasi@.
- Remove unused lwp_relock() and replace lwp_lock_retry() by simplifying
lwp_lock() and sleepq_enter() a little.
- Give alllwp its own cache-line and mark lwp_cache pointer as read-mostly.

OK ad@
 1.18  22-Nov-2009  mbalmer branches: 1.18.4;
s/the the/the/
 1.17  21-Oct-2009  rmind Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.16  21-Mar-2009  ad Allocate sleep queue locks with mutex_obj_alloc. Reduces memory usage
on !MP kernels, and reduces false sharing on MP ones.
 1.15  10-Oct-2008  pooka branches: 1.15.2; 1.15.8;
include prerequisite headers
 1.14  16-Jun-2008  ad branches: 1.14.2;
- Add sleepq_hashlock(). Like sleeptab_lookup() but only returns the lock
corresponding to a given wait channel.
- sleepq_enter: reduce number of function calls made.

No functional change.
 1.13  26-May-2008  ad branches: 1.13.2;
Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.
 1.12  19-May-2008  ad Reduce ifdefs due to MULTIPROCESSOR slightly.
 1.11  28-Apr-2008  martin branches: 1.11.2;
Remove clause 3 and 4 from TNF licenses
 1.10  17-Mar-2008  ad branches: 1.10.2; 1.10.4;
Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.
 1.9  07-Nov-2007  ad branches: 1.9.10; 1.9.14;
Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.8  06-Nov-2007  ad Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.
 1.7  17-May-2007  yamt branches: 1.7.6; 1.7.8; 1.7.12; 1.7.14;
merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
 1.6  29-Mar-2007  ad - cv_wakeup: remove this. There are ~zero situations where it's useful.
- cv_wait and friends: after resuming execution, check to see if we have
been restarted as a result of cv_signal. If we have, but cannot take
the wakeup (because of eg a pending Unix signal or timeout) then try to
ensure that another LWP sees it. This is necessary because there may
be multiple waiters, and at least one should take the wakeup if possible.
Prompted by a discussion with pooka@.
- typedef struct lwp lwp_t;
- int -> bool, struct lwp -> lwp_t in a few places.
 1.5  27-Feb-2007  yamt branches: 1.5.2; 1.5.4; 1.5.6;
typedef pri_t and use it instead of int and u_char.
 1.4  26-Feb-2007  yamt move wchan_t and syncobj_t to a dedicated header to simplify dependencies.
fix "XXX wchan_t".
 1.3  26-Feb-2007  yamt implement priority inheritance.
 1.2  09-Feb-2007  ad branches: 1.2.2; 1.2.4;
Merge newlock2 to head.
 1.1  20-Oct-2006  ad branches: 1.1.2;
file sleepq.h was initially added on branch newlock2.
 1.1.2.10  09-Feb-2007  ad - Change syncobj_t::sobj_changepri() to alter both the user priority and
the effective priority of LWPs. How the effective priority is adjusted
depends on the type of object.
- Add a couple of comments to sched_kpri() and remrunqueue().
 1.1.2.9  05-Feb-2007  ad Try to reduce cache line ping-ponging.
 1.1.2.8  05-Feb-2007  ad Declare turnstile_print().
 1.1.2.7  27-Jan-2007  ad Rename some functions to better describe what they do.
 1.1.2.6  16-Jan-2007  ad Adjust arguments to _lwp_park() and friends so that in the best case
_lwp_unpark_all() only has to traverse one sleep queue.
 1.1.2.5  11-Jan-2007  ad Checkpoint work in progress.
 1.1.2.4  29-Dec-2006  ad Checkpoint work in progress.
 1.1.2.3  17-Nov-2006  ad Fix an obvious sleep/wakeup bug introduced in previous.
 1.1.2.2  17-Nov-2006  ad Checkpoint work in progress.
 1.1.2.1  20-Oct-2006  ad Add a sleep queue implementation.
 1.2.4.5  24-Mar-2008  yamt sync with head.
 1.2.4.4  15-Nov-2007  yamt sync with head.
 1.2.4.3  03-Sep-2007  yamt sync with head.
 1.2.4.2  26-Feb-2007  yamt sync with head.
 1.2.4.1  09-Feb-2007  yamt file sleepq.h was added on branch yamt-lazymbuf on 2007-02-26 09:12:15 +0000
 1.2.2.7  30-Apr-2007  rmind - Remove KERN_SCHED, do not break KERN_MAXID and use dynamic node creation,
since we are moving to dynamic sysctl anyway - note by <mrg>.
- Remove sched_slept() hook - we are not going to use it.
 1.2.2.6  19-Apr-2007  ad Pull up a change from the vmlocking branch:

- Ensure that LWPs going to sleep are on the sleep queue before releasing
any interlocks. This is so that calls to turnstile_wakeup will have the
correct locks held when adjusting priority. Avoids another deadlock.
- Assume that LWPs blocked on a turnstile will never be swapped out.
- LWPs blocking on a turnstile must have kernel priority, as they
are consuming kernel resources.
 1.2.2.5  16-Apr-2007  ad - Nuke the seperate scheduler locking scheme for UP kernels - it has been
at the root of too many bugs.
- Add a LW_BOUND flag that indicates an LWP is bound to a specific CPU.
 1.2.2.4  15-Apr-2007  yamt sync with head.
 1.2.2.3  09-Mar-2007  rmind Checkpoint:

- Addition of scheduler-specific pointers in the struct proc, lwp and
schedstate_percpu.
- Addition of sched_lwp_fork(), sched_lwp_exit() and sched_slept() hooks.
- mi_switch() now has only one argument.
- sched_nextlwp(void) becomes sched_switch(struct lwp *) and does an
enqueueing of LWP.
- Addition of general kern.sched sysctl node.
- Remove twice called uvmexp.swtch++, other cleanups.

Discussed on tech-kern@
 1.2.2.2  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.2.2.1  17-Feb-2007  yamt - separate context switching and thread scheduling.
- introduce idle lwp.
- change some related MD/MI interfaces and implement i386 version.
 1.5.6.1  29-Mar-2007  reinoud Pullup to -current
 1.5.4.1  11-Jul-2007  mjf Sync with head.
 1.5.2.5  01-Nov-2007  ad - Fix interactivity problems under high load. Beacuse soft interrupts
are being stacked on top of regular LWPs, more often than not aston()
was being called on a soft interrupt thread instead of a user thread,
meaning that preemption was not happening on EOI.

- Don't use bool in a couple of data structures. Sub-word writes are not
always atomic and may clobber other fields in the containing word.

- For SCHED_4BSD, make p_estcpu per thread (l_estcpu). Rework how the
dynamic priority level is calculated - it's much better behaved now.

- Kill the l_usrpri/l_priority split now that priorities are no longer
directly assigned by tsleep(). There are three fields describing LWP
priority:

l_priority: Dynamic priority calculated by the scheduler.
This does not change for kernel/realtime threads,
and always stays within the correct band. Eg for
timeshared LWPs it never moves out of the user
priority range. This is basically what l_usrpri
was before.

l_inheritedprio: Lent to the LWP due to priority inheritance
(turnstiles).

l_kpriority: A boolean value set true the first time an LWP
sleeps within the kernel. This indicates that the LWP
should get a priority boost as compensation for blocking.
lwp_eprio() now does the equivalent of sched_kpri() if
the flag is set. The flag is cleared in userret().

- Keep track of scheduling class (OTHER, FIFO, RR) in struct lwp, and use
this to make decisions in a few places where we previously tested for a
kernel thread.

- Partially fix itimers and usr/sys/intr time accounting in the presence
of software interrupts.

- Use kthread_create() to create idle LWPs. Move priority definitions
from the various modules into sys/param.h.

- newlwp -> lwp_create
 1.5.2.4  01-Sep-2007  ad Update for pool_cache API changes.
 1.5.2.3  08-Jun-2007  ad Sync with head.
 1.5.2.2  10-Apr-2007  ad - Ensure that that LWPs going to sleep are on the sleep queue and so
have their syncobj pointer updated, so that calls to turnstile_wakeup
will have the correct locks held when adjusting the current LWP's
priority. Avoids another deadlock.
- Assume that LWPs blocked on a turnstile will never be swapped out.
- LWPs blocking on a turnstile must have kernel priority, as they
are consuming kernel resources.
 1.5.2.1  10-Apr-2007  ad Sync with head.
 1.7.14.1  19-Nov-2007  mjf Sync with HEAD.
 1.7.12.1  13-Nov-2007  bouyer Sync with HEAD
 1.7.8.3  23-Mar-2008  matt sync with HEAD
 1.7.8.2  08-Nov-2007  matt sync with -HEAD
 1.7.8.1  06-Nov-2007  matt sync with HEAD
 1.7.6.2  11-Nov-2007  joerg Sync with HEAD.
 1.7.6.1  06-Nov-2007  joerg Sync with HEAD.
 1.9.14.4  17-Jan-2009  mjf Sync with HEAD.
 1.9.14.3  29-Jun-2008  mjf Sync with HEAD.
 1.9.14.2  02-Jun-2008  mjf Sync with HEAD.
 1.9.14.1  03-Apr-2008  mjf Sync with HEAD.
 1.9.10.1  24-Mar-2008  keiichi sync with head.
 1.10.4.3  11-Mar-2010  yamt sync with head
 1.10.4.2  04-May-2009  yamt sync with head.
 1.10.4.1  16-May-2008  yamt sync with head.
 1.10.2.3  17-Jun-2008  yamt sync with head.
 1.10.2.2  04-Jun-2008  yamt sync with head
 1.10.2.1  18-May-2008  yamt sync with head.
 1.11.2.2  10-Oct-2008  skrll Sync with HEAD.
 1.11.2.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.13.2.1  18-Jun-2008  simonb Sync with head.
 1.14.2.1  19-Oct-2008  haad Sync with HEAD.
 1.15.8.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.15.2.1  28-Apr-2009  skrll Sync with HEAD.
 1.18.4.1  05-Mar-2011  rmind sync with head
 1.19.8.2  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.19.8.1  17-Apr-2012  yamt sync with head
 1.21.2.1  24-Feb-2012  mrg sync to -current.
 1.22.12.1  10-Aug-2014  tls Rebase.
 1.22.4.1  18-May-2014  rmind sync with head
 1.22.2.2  03-Dec-2017  jdolecek update from HEAD
 1.22.2.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.23.4.1  06-Apr-2015  skrll Sync with HEAD
 1.24.16.1  22-Apr-2018  pgoyette Sync with HEAD
 1.25.2.2  21-Apr-2020  martin Sync with HEAD
 1.25.2.1  08-Apr-2020  martin Merge changes from current as of 20200406
 1.28.2.1  20-Apr-2020  bouyer Sync with HEAD
 1.32.2.1  14-Dec-2020  thorpej Sync w/ HEAD.

RSS XML Feed