Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/sched_m2.c
RevisionDateAuthorComments
 1.40  24-Jan-2024  christos Unbreak sched_m2 (died because lwp_eproc() KASSERT in DIAGNOSTIC) and explain
what is going on. This has been broken since the introduction of l_mutex
5 months ago.
 1.39  23-May-2020  ad Oops. If a SCHED_RR thread is preempted and has exceeded its timeslice it
needs to go to the back of the run queue so round-robin actually happens,
otherwise it should go to the front.
 1.38  13-Apr-2020  maxv hardclock_ticks -> getticks()
 1.37  06-Dec-2019  ad branches: 1.37.6;
sched_tick(): don't try to optimise something that's called 10 times a
second, it's a fine way to introduce bugs (and I did). Use the MI
interface for rescheduling which always does the correct thing.
 1.36  01-Dec-2019  ad Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.
 1.35  01-Dec-2019  ad PR port-sparc/54718 (sparc install hangs since recent scheduler changes)

- sched_tick: cpu_need_resched is no longer the correct thing to do here.
All we need to do is OR the request into the local ci_want_resched.

- sched_resched_cpu: we need to set RESCHED_UPREEMPT even on softint LWPs,
especially in the !__HAVE_FAST_SOFTINTS case, because the LWP with the
LP_INTR flag could be running via softint_overlay() - i.e. it has been
temporarily borrowed from a user process, and it needs to notice the
resched after it has stopped running softints.
 1.34  22-Nov-2019  ad sched_tick: examine the correct LWP, and lock it.
 1.33  03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.32  24-Jun-2014  maxv branches: 1.32.26; 1.32.28;
'miliseconds' -> 'milliseconds'.
 1.31  25-Feb-2014  pooka branches: 1.31.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.30  16-Sep-2011  christos branches: 1.30.2; 1.30.12; 1.30.16;
This is no place to attach the primary cpu. Things go wrong from here because
for example it is missing its name.
 1.29  22-Nov-2009  mbalmer more s/the the/the/
 1.28  06-Jul-2009  joerg Remove unused include.
 1.27  18-Oct-2008  rmind branches: 1.27.8;
Make SCHED_M2 nice with nice(1). Closes PR/38048.
 1.26  07-Oct-2008  rmind - Replace lwp_t::l_sched_info with union: pointer and timeslice.
- Change minimal time-quantum to ~20 ms.
- Thus remove unneeded pool in M2, and unused sched_lwp_exit().
- Do not increase l_slptime twice for SCHED_4BSD (regression fix).
 1.25  19-May-2008  rmind branches: 1.25.4;
- Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).
 1.24  12-Apr-2008  ad branches: 1.24.2; 1.24.4; 1.24.6;
Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.
 1.23  27-Mar-2008  ad Replace use of CACHE_LINE_SIZE in some obvious places.
 1.22  11-Mar-2008  rmind - Perform periodical balancing of CPU-bound threads, which tends to
never sleep. Should fix PR/37245 by <yamt>.
- Fix a regression - dissalow catching of bound threads. Also, allow
migration of non-bound kthreads, this restriction seems pointless.
- Few micro-optimisations, misc.
 1.21  10-Mar-2008  martin Use cpu index instead of the machine dependend, not very expressive
cpuid when naming user-visible kernel entities.
 1.20  14-Feb-2008  ad branches: 1.20.2; 1.20.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.
 1.19  31-Jan-2008  rmind - sched_cpuattach: fix address calculation, use roundup2();
Fixes the problems with systems having > 2GB of memory;
From <drochner>, thanks for catching this!
- Convert pool to pool-cache;
- Adjust copyright while here;
 1.18  15-Jan-2008  rmind sched_slept: Revert inclusion of PRI_HIGHEST_TS into the range.
Reported by <drochner>.
 1.17  15-Jan-2008  rmind Remove PRI_DEFAULT, which was left previously..

Note: nice(1) is only for historical purposes, schedctl(8) should be used.
 1.16  15-Jan-2008  rmind - Estimate cache-hotness in all states, except LSIDL;
- Include PRI_HIGHEST_TS value into the increasion range,
it was missed previously by mistake;
- More KASSERTs to handle invalid priorities of threads;
- Remove PRI_DEFAULT;
- Misc;
 1.15  15-Jan-2008  rmind Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.
 1.14  21-Dec-2007  ad KM_NOSLEEP -> KM_SLEEP for clarity.
 1.13  05-Dec-2007  ad branches: 1.13.4;
Match the docs: MUTEX_DRIVER/SPIN are now only for porting code written
for Solaris.
 1.12  28-Nov-2007  rmind branches: 1.12.2;
Unify the license: All rights reserved.
No functional change.
 1.11  07-Nov-2007  rmind Modifications for the recent vmlocking changes:
- Re-enqueue the thread when priority changes and it is in LSRUN state;
- Handle the __HAVE_FAST_SOFTINTS case in sched_curcpu_runnable_p();
- Few minor changes;
 1.10  06-Nov-2007  ad branches: 1.10.2;
Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.
 1.9  04-Nov-2007  rmind branches: 1.9.2;
Fix sysctl_createv "pasto" in previous.
 1.8  04-Nov-2007  rmind - sched_setup: use ilog2() for min_catch, which fixes the case when count
of CPU is non-power of 2. Fixes PR/37244.
- sched_enqueue: initialize sl_lrtime, when it is zero (new thread).
Part of PR/37245.
- Fix the mints/maxts sysctl helpers, use mstohz() for the checks. Also,
I meant miliseconds, not microseconds. Found by <bjs>.
 1.7  04-Nov-2007  rmind - Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.
 1.6  19-Oct-2007  ad branches: 1.6.2; 1.6.4;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.5  14-Oct-2007  yamt branches: 1.5.2;
fix typos in a comment.
 1.4  13-Oct-2007  yamt branches: 1.4.2;
sched_wakeup: remove a wrong assertion.
 1.3  10-Oct-2007  rmind branches: 1.3.2;
sched_catchlwp: Estimate the pointers of CPU structures, not spc_mutex'es,
when double-locking the runqueues.
 1.2  10-Oct-2007  rmind sched_tick: There is no need to re-schedule in a case when
CURCPU_IDLE_P() is true. Simplify a little bit.

OK by <ad>.
 1.1  09-Oct-2007  rmind Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!
 1.3.2.7  05-Nov-2007  ad - Locking tweaks for estcpu/nice. XXX The schedclock musn't run above
IPL_SCHED.
- Hide most references to l_estcpu.
- l_policy was here first, but l_class is referenced in more places now.
 1.3.2.6  01-Nov-2007  ad - Fix interactivity problems under high load. Beacuse soft interrupts
are being stacked on top of regular LWPs, more often than not aston()
was being called on a soft interrupt thread instead of a user thread,
meaning that preemption was not happening on EOI.

- Don't use bool in a couple of data structures. Sub-word writes are not
always atomic and may clobber other fields in the containing word.

- For SCHED_4BSD, make p_estcpu per thread (l_estcpu). Rework how the
dynamic priority level is calculated - it's much better behaved now.

- Kill the l_usrpri/l_priority split now that priorities are no longer
directly assigned by tsleep(). There are three fields describing LWP
priority:

l_priority: Dynamic priority calculated by the scheduler.
This does not change for kernel/realtime threads,
and always stays within the correct band. Eg for
timeshared LWPs it never moves out of the user
priority range. This is basically what l_usrpri
was before.

l_inheritedprio: Lent to the LWP due to priority inheritance
(turnstiles).

l_kpriority: A boolean value set true the first time an LWP
sleeps within the kernel. This indicates that the LWP
should get a priority boost as compensation for blocking.
lwp_eprio() now does the equivalent of sched_kpri() if
the flag is set. The flag is cleared in userret().

- Keep track of scheduling class (OTHER, FIFO, RR) in struct lwp, and use
this to make decisions in a few places where we previously tested for a
kernel thread.

- Partially fix itimers and usr/sys/intr time accounting in the presence
of software interrupts.

- Use kthread_create() to create idle LWPs. Move priority definitions
from the various modules into sys/param.h.

- newlwp -> lwp_create
 1.3.2.5  23-Oct-2007  ad Sync with head.
 1.3.2.4  13-Oct-2007  rmind - Estimate and modify SL_BATCH in any case;
- Decrease the priority only in a case of time-sharing queue;
- Set l_priority in case when time-quantum expires;
 1.3.2.3  11-Oct-2007  rmind Adapt SCHED_M2 to the vmlocking branch.
- Priorities are inverted;
- New priority scheme;
- Few minor clean ups;
Now it builds and runs, but some bug is still hiding in the source tree.
 1.3.2.2  10-Oct-2007  rmind Sync with HEAD.
 1.3.2.1  10-Oct-2007  rmind file sched_m2.c was added on branch vmlocking on 2007-10-10 23:03:25 +0000
 1.4.2.3  18-Oct-2007  yamt sync with head.
 1.4.2.2  14-Oct-2007  yamt sync with head.
 1.4.2.1  13-Oct-2007  yamt file sched_m2.c was added on branch yamt-x86pmap on 2007-10-14 11:48:45 +0000
 1.5.2.2  13-Nov-2007  bouyer Sync with HEAD
 1.5.2.1  25-Oct-2007  bouyer Sync with HEAD.
 1.6.4.8  17-Mar-2008  yamt sync with head.
 1.6.4.7  27-Feb-2008  yamt sync with head.
 1.6.4.6  04-Feb-2008  yamt sync with head.
 1.6.4.5  21-Jan-2008  yamt sync with head
 1.6.4.4  07-Dec-2007  yamt sync with head
 1.6.4.3  15-Nov-2007  yamt sync with head.
 1.6.4.2  27-Oct-2007  yamt sync with head.
 1.6.4.1  19-Oct-2007  yamt file sched_m2.c was added on branch yamt-lazymbuf on 2007-10-27 11:35:31 +0000
 1.6.2.7  09-Dec-2007  jmcneill Sync with HEAD.
 1.6.2.6  03-Dec-2007  joerg Sync with HEAD.
 1.6.2.5  11-Nov-2007  joerg Sync with HEAD.
 1.6.2.4  06-Nov-2007  joerg Sync with HEAD.
 1.6.2.3  04-Nov-2007  jmcneill Sync with HEAD.
 1.6.2.2  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.6.2.1  19-Oct-2007  joerg file sched_m2.c was added on branch jmcneill-pm on 2007-10-26 15:48:38 +0000
 1.9.2.4  18-Feb-2008  mjf Sync with HEAD.
 1.9.2.3  27-Dec-2007  mjf Sync with HEAD.
 1.9.2.2  08-Dec-2007  mjf Sync with HEAD.
 1.9.2.1  19-Nov-2007  mjf Sync with HEAD.
 1.10.2.5  23-Mar-2008  matt sync with HEAD
 1.10.2.4  09-Jan-2008  matt sync with HEAD
 1.10.2.3  08-Nov-2007  matt sync with -HEAD
 1.10.2.2  06-Nov-2007  matt sync with HEAD
 1.10.2.1  06-Nov-2007  matt file sched_m2.c was added on branch matt-armv6 on 2007-11-06 23:32:09 +0000
 1.12.2.2  26-Dec-2007  ad Sync with head.
 1.12.2.1  08-Dec-2007  ad Sync with head.
 1.13.4.2  19-Jan-2008  bouyer Sync with HEAD
 1.13.4.1  02-Jan-2008  bouyer Sync with HEAD
 1.20.6.3  17-Jan-2009  mjf Sync with HEAD.
 1.20.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.20.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.20.2.1  24-Mar-2008  keiichi sync with head.
 1.24.6.2  10-Oct-2008  skrll Sync with HEAD.
 1.24.6.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.24.4.3  11-Mar-2010  yamt sync with head
 1.24.4.2  18-Jul-2009  yamt sync with head.
 1.24.4.1  04-May-2009  yamt sync with head.
 1.24.2.1  04-Jun-2008  yamt sync with head
 1.25.4.1  19-Oct-2008  haad Sync with HEAD.
 1.27.8.1  23-Jul-2009  jym Sync with HEAD.
 1.30.16.1  18-May-2014  rmind sync with head
 1.30.12.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.30.2.1  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.31.2.1  10-Aug-2014  tls Rebase.
 1.32.28.3  21-Apr-2020  martin Sync with HEAD
 1.32.28.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.32.28.1  10-Jun-2019  christos Sync with HEAD
 1.32.26.1  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.37.6.1  20-Apr-2020  bouyer Sync with HEAD

RSS XML Feed