Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/kern_idle.c
RevisionDateAuthorComments
 1.36  01-Mar-2024  mrg check that l_nopreempt (preemption count) doesn't change after callbacks

check that the idle loop, soft interrupt handlers, workqueue, and xcall
callbacks do not modify the preemption count, in most cases, knowing it
should be 0 currently.

this work was originally done by simonb. cleaned up slightly and some
minor enhancement made by myself, and with discussion with riastradh@.

other callback call sites could check this as well (such as MD interrupt
handlers, or really anything that includes a callback registration. x86
version to be commited separately.)
 1.35  05-Oct-2023  ad The idle LWP doesn't need to care about kernel_lock.
 1.34  05-Sep-2020  riastradh branches: 1.34.20;
Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.33  26-Mar-2020  ad Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).
 1.32  15-Feb-2020  ad - Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.
 1.31  25-Jan-2020  ad For secondary CPUs, the idle LWP is the first to run, and it's directly
entered from MD code without a trip through mi_switch(). Make the picture
look good in case the CPU takes an interrupt before it calls idle_loop().
 1.30  08-Jan-2020  ad Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.
 1.29  31-Dec-2019  ad branches: 1.29.2;
- Add and use wrapper functions that take and acquire page interlocks, and pairs
of page interlocks. Require that the page interlock be held over calls to
uvm_pageactivate(), uvm_pagewire() and similar.

- Solve the concurrency problem with page replacement state. Rather than
updating the global state synchronously, set an intended state on
individual pages (active, inactive, enqueued, dequeued) while holding the
page interlock. After the interlock is released put the pages on a 128
entry per-CPU queue for their state changes to be made real in batch.
This results in in a ~400 fold decrease in contention on my test system.
Proposed on tech-kern but modified to use the page interlock rather than
atomics to synchronise as it's much easier to maintain that way, and
cheaper.
 1.28  06-Dec-2019  ad Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().
 1.27  01-Dec-2019  ad Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.
 1.26  23-Nov-2019  ad Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.
 1.25  29-Jan-2012  rmind branches: 1.25.48;
- Add mi_cpu_init() and initialise cpu_lock and kcpuset_attached/running there.
- Add kcpuset_running which gets set in idle_loop().
- Use kcpuset_running in pserialize_perform().
 1.24  17-Jan-2011  uebayasi branches: 1.24.6; 1.24.10;
Include internal definitions (uvm/uvm.h) only where necessary.
 1.23  19-Jul-2009  yamt branches: 1.23.4; 1.23.6;
set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.
 1.22  28-Jun-2009  ad idle_loop: explicitly go to spl0() to sidestep potential MD bugs.
 1.21  11-Jun-2008  ad branches: 1.21.10;
Don't call uvm_pageidlezero() if the CPU is marked offline.
 1.20  04-Jun-2008  ad branches: 1.20.2;
- vm_page: put listq, pageq into a union alongside a LIST_ENTRY, so we can
use both types of list.

- Make page coloring and idle zero state per-CPU.

- Maintain per-CPU page freelists. When freeing, put pages onto the local
CPU's lists and the global lists. When allocating, prefer to take pages
from the local CPU. If none are available take from the global list as
done now. Proposed on tech-kern@.
 1.19  29-May-2008  rmind Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.
 1.18  27-May-2008  ad PR kern/38707 scheduler related deadlock during build.sh

- Fix performance regression inroduced by the workaround by making job
stealing a lot simpler: if the local run queue is empty, let the CPU enter
the idle loop. In the idle loop, try to steal a job from another CPU's run
queue if we are idle. If we succeed, re-enter mi_switch() immediatley to
dispatch the job.

- When stealing jobs, consider a remote CPU to have one less job in its
queue if it's currently in the idle loop. It will dispatch the job soon,
so there's no point sloshing it about.

- Introduce a few event counters to monitor what's happening with the run
queues.

- Revert the idle CPU bitmap change. It's pointless considering NUMA.
 1.17  24-May-2008  ad Set cpu_onproc on entry to the idle loop.
 1.16  26-Apr-2008  yamt branches: 1.16.2; 1.16.4;
fix a comment.
 1.15  26-Apr-2008  yamt idle_loop: unsigned -> uint32_t to be consistent with the rest of the code.
no functional change.
 1.14  24-Apr-2008  ad xc_broadcast: don't try to run cross calls on CPUs that are not yet running.
 1.13  04-Apr-2008  ad branches: 1.13.2;
Maintain a bitmap of idle CPUs and add idle_pick() to find an idle CPU
and remove it from the bitmap.
 1.12  10-Mar-2008  martin Use cpu index instead of the machine dependend, not very expressive
cpuid when naming user-visible kernel entities.
 1.11  14-Feb-2008  ad branches: 1.11.2; 1.11.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.
 1.10  22-Dec-2007  yamt use binuptime for l_stime/l_rtime.
 1.9  15-Nov-2007  ad branches: 1.9.2; 1.9.6;
Lock curlwp when updating the start time.
 1.8  13-Nov-2007  ad Remove KERNEL_LOCK_ASSERT_LOCKED, KERNEL_LOCK_ASSERT_UNLOCKED since the
kernel_lock functions can be patched out at runtime now. Assertions are
provided by the existing functions and by LOCKDEBUG_BARRIER.
 1.7  06-Nov-2007  ad Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.
 1.6  08-Oct-2007  ad branches: 1.6.2; 1.6.4;
Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.
 1.5  01-Oct-2007  ad Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.
 1.4  21-Jul-2007  ad branches: 1.4.4; 1.4.6; 1.4.8; 1.4.10; 1.4.12;
Don't depend on uvm_extern.h pulling in proc.h.
 1.3  09-Jul-2007  ad branches: 1.3.2; 1.3.4;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.2  17-May-2007  yamt merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
 1.1  17-Feb-2007  yamt branches: 1.1.2; 1.1.6;
file kern_idle.c was initially added on branch yamt-idlelwp.
 1.1.6.4  01-Nov-2007  ad - Fix interactivity problems under high load. Beacuse soft interrupts
are being stacked on top of regular LWPs, more often than not aston()
was being called on a soft interrupt thread instead of a user thread,
meaning that preemption was not happening on EOI.

- Don't use bool in a couple of data structures. Sub-word writes are not
always atomic and may clobber other fields in the containing word.

- For SCHED_4BSD, make p_estcpu per thread (l_estcpu). Rework how the
dynamic priority level is calculated - it's much better behaved now.

- Kill the l_usrpri/l_priority split now that priorities are no longer
directly assigned by tsleep(). There are three fields describing LWP
priority:

l_priority: Dynamic priority calculated by the scheduler.
This does not change for kernel/realtime threads,
and always stays within the correct band. Eg for
timeshared LWPs it never moves out of the user
priority range. This is basically what l_usrpri
was before.

l_inheritedprio: Lent to the LWP due to priority inheritance
(turnstiles).

l_kpriority: A boolean value set true the first time an LWP
sleeps within the kernel. This indicates that the LWP
should get a priority boost as compensation for blocking.
lwp_eprio() now does the equivalent of sched_kpri() if
the flag is set. The flag is cleared in userret().

- Keep track of scheduling class (OTHER, FIFO, RR) in struct lwp, and use
this to make decisions in a few places where we previously tested for a
kernel thread.

- Partially fix itimers and usr/sys/intr time accounting in the presence
of software interrupts.

- Use kthread_create() to create idle LWPs. Move priority definitions
from the various modules into sys/param.h.

- newlwp -> lwp_create
 1.1.6.3  09-Oct-2007  ad Sync with head.
 1.1.6.2  17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.1.6.1  09-Jun-2007  ad Sync with head.
 1.1.2.9  13-May-2007  ad Assign a per-CPU lock to LWPs as they transition into the ONPROC state.

http://mail-index.netbsd.org/tech-kern/2007/05/06/0003.html
 1.1.2.8  21-Apr-2007  mrg add a missing ";" in the !MP case.
 1.1.2.7  21-Apr-2007  ad Some changes mainly for top/ps:

- Add an optional name field to struct lwp.
- Count the total number of context switches + involuntary,
not voluntary + involuntary.
- Mark the idle threads as LSIDL when not running, otherwise
they show up funny in a top(1) that shows threads.
- Make pctcpu and cpticks per-LWP attributes.
- Add to kinfo_lwp: cpticks, pctcpu, pid, name.
 1.1.2.6  16-Apr-2007  ad - Nuke the seperate scheduler locking scheme for UP kernels - it has been
at the root of too many bugs.
- Add a LW_BOUND flag that indicates an LWP is bound to a specific CPU.
 1.1.2.5  24-Mar-2007  rmind Checkpoint:
- Abstract for per-CPU locking of runqueues.
As a workaround for SCHED_4BSD global runqueue, covered by sched_mutex,
spc_mutex is a pointer for now. After making SCHED_4BSD runqueues
per-CPU, it will became a storage mutex.
- suspendsched: Locking is not necessary for cpu_need_resched().
- Remove mutex_spin_exit() prototype in patch.c and LOCK_ASSERT() check
in runqueue_nextlwp() in sched_4bsd.c to make them compile again.
 1.1.2.4  09-Mar-2007  rmind Checkpoint:

- Addition of scheduler-specific pointers in the struct proc, lwp and
schedstate_percpu.
- Addition of sched_lwp_fork(), sched_lwp_exit() and sched_slept() hooks.
- mi_switch() now has only one argument.
- sched_nextlwp(void) becomes sched_switch(struct lwp *) and does an
enqueueing of LWP.
- Addition of general kern.sched sysctl node.
- Remove twice called uvmexp.swtch++, other cleanups.

Discussed on tech-kern@
 1.1.2.3  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.1.2.2  23-Feb-2007  yamt - introduce sys/cpu.h which has cpu_idle and cpu_need_resched.
- use it where appropriate.
- while i'm here, remove several unnecessary #include.
 1.1.2.1  17-Feb-2007  yamt - separate context switching and thread scheduling.
- introduce idle lwp.
- change some related MD/MI interfaces and implement i386 version.
 1.3.4.1  15-Aug-2007  skrll Sync with HEAD.
 1.3.2.2  11-Jul-2007  mjf Sync with head.
 1.3.2.1  09-Jul-2007  mjf file kern_idle.c was added on branch mjf-ufs-trans on 2007-07-11 20:09:48 +0000
 1.4.12.2  21-Jul-2007  ad Don't depend on uvm_extern.h pulling in proc.h.
 1.4.12.1  21-Jul-2007  ad file kern_idle.c was added on branch matt-mips64 on 2007-07-21 19:06:23 +0000
 1.4.10.2  14-Oct-2007  yamt sync with head.
 1.4.10.1  06-Oct-2007  yamt sync with head.
 1.4.8.8  17-Mar-2008  yamt sync with head.
 1.4.8.7  27-Feb-2008  yamt sync with head.
 1.4.8.6  21-Jan-2008  yamt sync with head
 1.4.8.5  07-Dec-2007  yamt sync with head
 1.4.8.4  15-Nov-2007  yamt sync with head.
 1.4.8.3  27-Oct-2007  yamt sync with head.
 1.4.8.2  03-Sep-2007  yamt sync with head.
 1.4.8.1  21-Jul-2007  yamt file kern_idle.c was added on branch yamt-lazymbuf on 2007-09-03 14:40:47 +0000
 1.4.6.3  23-Mar-2008  matt sync with HEAD
 1.4.6.2  09-Jan-2008  matt sync with HEAD
 1.4.6.1  06-Nov-2007  matt sync with HEAD
 1.4.4.5  21-Nov-2007  joerg Sync with HEAD.
 1.4.4.4  14-Nov-2007  joerg Sync with HEAD.
 1.4.4.3  06-Nov-2007  joerg Sync with HEAD.
 1.4.4.2  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.4.4.1  02-Oct-2007  joerg Sync with HEAD.
 1.6.4.3  18-Feb-2008  mjf Sync with HEAD.
 1.6.4.2  27-Dec-2007  mjf Sync with HEAD.
 1.6.4.1  19-Nov-2007  mjf Sync with HEAD.
 1.6.2.2  18-Nov-2007  bouyer Sync with HEAD
 1.6.2.1  13-Nov-2007  bouyer Sync with HEAD
 1.9.6.1  02-Jan-2008  bouyer Sync with HEAD
 1.9.2.1  26-Dec-2007  ad Sync with head.
 1.11.6.4  29-Jun-2008  mjf Sync with HEAD.
 1.11.6.3  05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.11.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.11.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.11.2.1  24-Mar-2008  keiichi sync with head.
 1.13.2.3  17-Jun-2008  yamt sync with head.
 1.13.2.2  04-Jun-2008  yamt sync with head
 1.13.2.1  18-May-2008  yamt sync with head.
 1.16.4.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.16.2.3  19-Aug-2009  yamt sync with head.
 1.16.2.2  18-Jul-2009  yamt sync with head.
 1.16.2.1  04-May-2009  yamt sync with head.
 1.20.2.1  18-Jun-2008  simonb Sync with head.
 1.21.10.1  23-Jul-2009  jym Sync with HEAD.
 1.23.6.1  06-Jun-2011  jruoho Sync with HEAD.
 1.23.4.1  05-Mar-2011  rmind sync with head
 1.24.10.1  18-Feb-2012  mrg merge to -current.
 1.24.6.1  17-Apr-2012  yamt sync with head
 1.25.48.1  08-Apr-2020  martin Merge changes from current as of 20200406
 1.29.2.4  29-Feb-2020  ad Sync with head.
 1.29.2.3  29-Feb-2020  ad Sync with head.
 1.29.2.2  25-Jan-2020  ad Sync with head.
 1.29.2.1  17-Jan-2020  ad Sync with head.
 1.34.20.1  11-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #821):

sys/arch/x86/x86/intr.c: revision 1.169
sys/kern/kern_softint.c: revision 1.76
sys/kern/subr_workqueue.c: revision 1.48
sys/kern/kern_idle.c: revision 1.36
sys/kern/subr_xcall.c: revision 1.38

check that l_nopreempt (preemption count) doesn't change after callbacks

check that the idle loop, soft interrupt handlers, workqueue, and xcall
callbacks do not modify the preemption count, in most cases, knowing it
should be 0 currently.

this work was originally done by simonb. cleaned up slightly and some
minor enhancement made by myself, and with discussion with riastradh@.
other callback call sites could check this as well (such as MD interrupt
handlers, or really anything that includes a callback registration. x86
version to be commited separately.)

apply some more diagnostic checks for x86 interrupts
convert intr_biglock_wrapper() into a slight less complete
intr_wrapper(), and move the kernel lock/unlock points into
the new intr_biglock_wrapper().
add curlwp->l_nopreempt checking for interrupt handlers,
including the dtrace wrapper.

XXX: has to copy the i8254_clockintr hack.

tested for a few months by myself, and recently by rin@ on both
current and netbsd-10. thanks!

RSS XML Feed