Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/kern_synch.c
RevisionDateAuthorComments
 1.366  22-Nov-2023  riastradh kpause(9): KASSERT -> KASSERTMSG

PR kern/57718 (might help to diagnose manifestations of the problem)
 1.365  15-Oct-2023  riastradh kern_synch.c: Sort includes. No functional change intended.
 1.364  15-Oct-2023  riastradh sys/lwp.h: Nix sys/syncobj.h dependency.

Remove it in ddb/db_syncobj.h too.

New sys/wchan.h defines wchan_t so that users need not pull in
sys/syncobj.h to get it.

Sprinkle #include <sys/syncobj.h> in .c files where it is now needed.
 1.363  05-Oct-2023  ad Resolve !MULTIPROCESSOR build problem with the nasty kernel lock macros.
 1.362  04-Oct-2023  ad Eliminate l->l_biglocks. Originally I think it had a use but these days a
local variable will do.
 1.361  04-Oct-2023  ad Eliminate l->l_ncsw and l->l_nivcsw. From memory think they were added
before we had per-LWP struct rusage; the same is now tracked there.
 1.360  23-Sep-2023  ad Sigh.. Adjust previous to work as intended. The boosted LWP priority
didn't persist as far as the run queue because l_syncobj gets reset
earlier than I recalled.
 1.359  23-Sep-2023  ad - Simplify how priority boost for blocking in kernel is handled. Rather
than setting it up at each site where we block, make it a property of
syncobj_t. Then, do not hang onto the priority boost until userret(),
drop it as soon as the LWP is out of the run queue and onto a CPU.
Holding onto it longer is of questionable benefit.

- This allows two members of lwp_t to be deleted, and mi_userret() to be
simplified a lot (next step: trim it down to a single conditional).

- While here, constify syncobj_t and de-inline a bunch of small functions
like lwp_lock() which turn out not to be small after all (I don't know
why, but atomic_*_relaxed() seem to provoke a compiler shitfit above and
beyond what volatile does).
 1.358  17-Jul-2023  riastradh kern: New struct syncobj::sobj_name member for diagnostics.

XXX potential kernel ABI change -- not sure any modules actually use
struct syncobj but it's hard to rule that out because sys/syncobj.h
leaks into sys/lwp.h
 1.357  13-Jul-2023  riastradh kern: Print more detailed monotonic-clock-went-backwards messages.

Let's try harder to track this down.

XXX Should add dtrace probes.
 1.356  23-Jun-2023  riastradh tsleep: Comment out kernel lock assertion for now.

Breaks tpm(4) which breaks boot on a lot of systems. tpm(4)
shouldn't be using tsleep; it doesn't appear to even have an
interrupt handler for wakeups, so it could get by with kpause. If it
ever did sprout an interrupt handler it should use condvar(9) anyway.
But for now I don't have time to fix it tonight.
 1.355  23-Jun-2023  riastradh tsleep(9): Assert kernel lock held.

This is never safe to use without the kernel lock. It should only
appear in legacy subsystems that still run with the kernel lock.
 1.354  09-Apr-2023  riastradh kpause(9): Simplify assertion. No functional change intended.
 1.353  05-Dec-2022  martin If no more softints are pending on this cpu, clear ci_want_resched
(instead of just assingning ci_data.cpu_softints to it - the bitsets
are not the same).
Discussed on tech-kern "ci_want_resched bits vs. MD ci_data.cpu_softints bits".
 1.352  26-Oct-2022  riastradh kern/kern_synch.c: Get averunnable from sys/resource.h.
 1.351  29-Jun-2022  riastradh sleepq(9): Pass syncobj through to sleepq_block.

Previously the usage pattern was:

sleepq_enter(sq, l, lock); // locks l
...
sleepq_enqueue(sq, ..., sobj, ...); // assumes l locked, sets l_syncobj
... (*)
sleepq_block(...); // unlocks l

As long as l remains locked from sleepq_enter to sleepq_block,
l_syncobj is stable, and sleepq_block uses it via ktrcsw to determine
whether the sleep is on a mutex in order to avoid creating ktrace
context-switch records (which involves allocation which is forbidden
in softint context, while taking and even sleeping for a mutex is
allowed).

However, in turnstile_block, the logic at (*) also involves
turnstile_lendpri, which sometimes unlocks and relocks l. At that
point, another thread can swoop in and sleepq_remove l, which sets
l_syncobj to sched_syncobj. If that happens, ktrcsw does what is
forbidden -- tries to allocate a ktrace record for the context
switch.

As an optimization, sleepq_block or turnstile_block could stop early
if it detects that l_syncobj doesn't match -- we've already been
requested to wake up at this point so there's no need to mi_switch.
(And then it would be unnecessary to pass the syncobj through
sleepq_block, because l_syncobj would remain stable.) But I'll leave
that to another change.

Reported-by: syzbot+8b9d7b066c32dbcdc63b@syzkaller.appspotmail.com
 1.350  10-Mar-2022  riastradh kern: Fix synchronization of clearing LP_RUNNING and lwp_free.

1. membar_sync is not necessary here -- only a store-release is
required.

2. membar_consumer _before_ loading l->l_pflag is not enough; a
load-acquire is required.

Actually it's not really clear to me why any barriers are needed, since
the store-release and load-acquire should be implied by releasing and
acquiring the lwp lock (and maybe we could spin with the lock instead
of reading l->l_pflag unlocked). But maybe there's something subtle
about access to l->l_mutex that's not obvious here.
 1.349  23-May-2020  ad Move proc_lock into the data segment. It was dynamically allocated because
at the time we had mutex_obj_alloc() but not __cacheline_aligned.
 1.348  20-May-2020  maxv future-proof-ness
 1.347  19-Apr-2020  ad Set LW_SINTR earlier so it doesn't pose a problem for doing interruptable
waits with turnstiles (not currently done).
 1.346  04-Apr-2020  ad branches: 1.346.2;
preempt_needed(), preempt_point(): simplify the definition of these and
key on ci_want_resched in the interests of interactive response.
 1.345  26-Mar-2020  ad Leave the idle LWPs in state LSIDL even when running, so they don't mess up
output from ps/top/etc. Correctness isn't at stake, LWPs in other states
are temporarily on the CPU at times too (e.g. LSZOMB, LSSLEEP).
 1.344  14-Mar-2020  ad Make page waits (WANTED vs BUSY) interlocked by pg->interlock. Gets RW
locks out of the equation for sleep/wakeup, and allows observing+waiting
for busy pages when holding only a read lock. Proposed on tech-kern.
 1.343  14-Mar-2020  ad - Hide the details of SPCF_SHOULDYIELD and related behind a couple of small
functions: preempt_point() and preempt_needed().

- preempt(): if the LWP has exceeded its timeslice in kernel, strip it of
any priority boost gained earlier from blocking.
 1.342  23-Feb-2020  ad kpause(): is only awoken via timeout or signal, so use SOBJ_SLEEPQ_NULL like
_lwp_park() does, and dispense with the hashed sleepq & lock.
 1.341  23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.340  16-Feb-2020  ad nextlwp(): fix a couple of locking bugs including one I introduced yesterday,
and add comments around same.
 1.339  15-Feb-2020  ad - Move the LW_RUNNING flag back into l_pflag: updating l_flag without lock
in softint_dispatch() is risky. May help with the "softint screwup"
panic.

- Correct the memory barriers around zombies switching into oblivion.
 1.338  24-Jan-2020  ad Carefully put kernel_lock back the way it was, and add a comment hinting
that changing it is not a good idea, and hopefully nobody will ever try to
change it ever again.
 1.337  22-Jan-2020  ad - DIAGNOSTIC: check for leaked kernel_lock in mi_switch().

- Now that ci_biglock_wanted is set later, explicitly disable preemption
while acquiring kernel_lock. It was blocked in a roundabout way
previously.

Reported-by: syzbot+43111d810160fb4b978b@syzkaller.appspotmail.com
Reported-by: syzbot+f5b871bd00089bf97286@syzkaller.appspotmail.com
Reported-by: syzbot+cd1f15eee5b1b6d20078@syzkaller.appspotmail.com
Reported-by: syzbot+fb945a331dabd0b6ba9e@syzkaller.appspotmail.com
Reported-by: syzbot+53a0c2342b361db25240@syzkaller.appspotmail.com
Reported-by: syzbot+552222a952814dede7d1@syzkaller.appspotmail.com
Reported-by: syzbot+c7104a72172b0f9093a4@syzkaller.appspotmail.com
Reported-by: syzbot+efbd30c6ca0f7d8440e8@syzkaller.appspotmail.com
Reported-by: syzbot+330a421bd46794d8b750@syzkaller.appspotmail.com
 1.336  09-Jan-2020  ad - Many small tweaks to the SMT awareness in the scheduler. It does a much
better job now at keeping all physical CPUs busy, while using the extra
threads to help out. In particular, during preempt() if we're using SMT,
try to find a better CPU to run on and teleport curlwp there.

- Change the CPU topology stuff so it can work on asymmetric systems. This
mainly entails rearranging one of the CPU lists so it makes sense in all
configurations.

- Add a parameter to cpu_topology_set() to note that a CPU is "slow", for
where there are fast CPUs and slow CPUs, like with the Rockwell RK3399.
Extend the SMT awareness to try and handle that situation too (keep fast
CPUs busy, use slow CPUs as helpers).
 1.335  08-Jan-2020  ad Hopefully fix some problems seen with MP support on non-x86, in particular
where curcpu() is defined as curlwp->l_cpu:

- mi_switch(): undo the ~2007ish optimisation to unlock curlwp before
calling cpu_switchto(). It's not safe to let other actors mess with the
LWP (in particular l->l_cpu) while it's still context switching. This
removes l->l_ctxswtch.

- Move the LP_RUNNING flag into l->l_flag and rename to LW_RUNNING since
it's now covered by the LWP's lock.

- Ditch lwp_exit_switchaway() and just call mi_switch() instead. Everything
is in cache anyway so it wasn't buying much by trying to avoid saving old
state. This means cpu_switchto() will never be called with prevlwp ==
NULL.

- Remove some KERNEL_LOCK handling which hasn't been needed for years.
 1.334  21-Dec-2019  ad branches: 1.334.2;
schedstate_percpu: add new flag SPCF_IDLE as a cheap and easy way to
determine that a CPU is currently idle.
 1.333  20-Dec-2019  ad Use CPU_COUNT() to update nswtch. No functional change.
 1.332  16-Dec-2019  ad kpreempt_disabled(): softint LWPs aren't preemptable.
 1.331  07-Dec-2019  ad mi_switch: move an over eager KASSERT defeated by kernel preemption.
Discovered during automated test.
 1.330  07-Dec-2019  ad mi_switch: move LOCKDEBUG_BARRIER later to accomodate holding two locks
on entry.
 1.329  06-Dec-2019  ad Make it possible to call mi_switch() and immediately switch to another CPU.
This seems to take about 3us on my Intel system. Two changes required:

- Have the caller to mi_switch() be responsible for calling spc_lock().
- Avoid using l->l_cpu in mi_switch().

While here:

- Add a couple of calls to membar_enter()
- Have the idle LWP set itself to LSIDL, to match softint_thread().
- Remove unused return value from mi_switch().
 1.328  03-Dec-2019  riastradh Rip out pserialize(9) logic now that the RCU patent has expired.

pserialize_perform() is now basically just xc_barrier(XC_HIGHPRI).
No more tentacles throughout the scheduler. Simplify the psz read
count for diagnostic assertions by putting it unconditionally into
cpu_info.

From rmind@, tidied up by me.
 1.327  01-Dec-2019  ad Fix false sharing problems with cpu_info. Identified with tprof(8).
This was a very nice win in my tests on a 48 CPU box.

- Reorganise cpu_data slightly according to usage.
- Put cpu_onproc into struct cpu_info alongside ci_curlwp (now is ci_onproc).
- On x86, put some items in their own cache lines according to usage, like
the IPI bitmask and ci_want_resched.
 1.326  23-Nov-2019  ad Minor scheduler cleanup:

- Adapt to cpu_need_resched() changes. Avoid lost & duplicate IPIs and ASTs.
sched_resched_cpu() and sched_resched_lwp() contain the logic for this.
- Changes for LSIDL to make the locking scheme match the intended design.
- Reduce lock contention and false sharing further.
- Numerous small bugfixes, including some corrections for SCHED_FIFO/RT.
- Use setrunnable() in more places, and merge cut & pasted code.
 1.325  21-Nov-2019  ad - Don't give up kpriority boost in preempt(). That's unfair and bad for
interactive response. It should only be dropped on final return to user.
- Clear l_dopreempt with atomics and add some comments around concurrency.
- Hold proc_lock over the lightning bolt and loadavg calc, no reason not to.
- cpu_did_preempt() is useless - don't call it. Will remove soon.
 1.324  03-Oct-2019  kamil Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.
 1.323  03-Feb-2019  mrg branches: 1.323.4;
- add or adjust /* FALLTHROUGH */ where appropriate
- add __unreachable() after functions that can return but won't in
this case, and thus can't be marked __dead easily
 1.322  30-Nov-2018  mlelstv The SHOULDYIELD flag doesn't indicate that other LWPs could run but only
that the current LWP was seen on two consecutive scheduler intervals.

There are currently at least 3 cases for calling preempt().
- always call preempt()
- check the SHOULDYIELD flag
- check the real ci_want_resched

So the forced check for SHOULDYIELD changed the scheduler timing. Revert
it for now.
 1.321  28-Nov-2018  mlelstv Move counting involuntary switches into mi_switch. preempt() passes that
information by setting a new LWP flag.

While here, don't even try to switch when the scheduler has no other LWP
to run. This check is currently spread over all callers of preempt()
and will be removed there.

ok mrg@.
 1.320  28-Nov-2018  mlelstv Revert previous for a better fix.
 1.319  28-Nov-2018  mlelstv Fix statistics in case mi_switch didn't actually switch LWPs.
 1.318  14-Aug-2018  ozaki-r Change the place to check if a context switch doesn't happen within a pserialize read section

The previous place (pserialize_switchpoint) was not a good place because at that
point a suspect thread is already switched so that a backtrace gotten on
a KASSERT failure doesn't point out where a context switch happens.
 1.317  24-Jul-2018  bouyer In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).
Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.
 1.316  12-Jul-2018  maxv Remove the kernel PMC code. Sent yesterday on tech-kern@.

This change:

* Removes "options PERFCTRS", the associated includes, and the associated
ifdefs. In doing so, it removes several XXXSMPs in the MI code, which is
good.

* Removes the PMC code of ARM XSCALE.

* Removes all the pmc.h files. They were all empty, except for ARM XSCALE.

* Reorders the x86 PMC code not to rely on the legacy pmc.h file. The
definitions are put in sysarch.h.

* Removes the kern/sys_pmc.c file, and along with it, the sys_pmc_control
and sys_pmc_get_info syscalls. They are marked as OBSOL in kern,
netbsd32 and rump.

* Removes the pmc_evid_t and pmc_ctr_t types.

* Removes all the associated man pages. The sets are marked as obsolete.
 1.315  19-May-2018  jdolecek branches: 1.315.2;
Remove emap support. Unfortunately it never got to state where it would be
used and usable, due to reliability and limited & complicated MD support.

Going forward, we need to concentrate on interface which do not map anything
into kernel in first place (such as direct map or KVA-less I/O), rather
than making those mappings cheaper to do.
 1.314  16-Feb-2018  ozaki-r branches: 1.314.2;
Avoid a race condition between an LWP migration and curlwp_bind

curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.

For more details see https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html
 1.313  30-Jan-2018  ozaki-r Apply C99-style struct initialization to syncobj_t
 1.312  06-Aug-2017  christos use the same string for the log and uprintf.
 1.311  03-Jul-2016  christos branches: 1.311.10;
GSoC 2016 Charles Cui: Implement thread priority protection based on work
by Andy Doran. Also document the get/set pshared thread calls as not
implemented, and add a skeleton implementation that is disabled.
XXX: document _sched_protect(2).
 1.310  04-Apr-2016  christos Split p_xstat (composite wait(2) status code, or signal number depending
on context) into:
1. p_xexit: exit code
2. p_xsig: signal number
3. p_sflag & WCOREFLAG bit to indicated that the process core-dumped.

Fix the documentation of the flag bits in <sys/proc.h>
 1.309  13-Oct-2015  pgoyette When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.

Fixes PR kern/50318

Pullups will be requested for:

NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
 1.308  28-Feb-2014  skrll branches: 1.308.4; 1.308.6; 1.308.8;
G/C sys/simplelock.h includes
 1.307  15-Sep-2013  martin Remove __CT_LOCAL_.. hack
 1.306  14-Sep-2013  martin Guard a function local CTASSERT with prologue/epilogue
 1.305  02-Sep-2012  mlelstv branches: 1.305.2; 1.305.4;
The field ci_curlwp is only defined for MULTIPROCESSOR kernels.
 1.304  30-Aug-2012  matt Add a new more KASSERT/KASSERTMSG
 1.303  18-Aug-2012  christos PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.
 1.302  27-Jul-2012  matt Remove safepri and use IPL_SAFEPRI instead. This may be defined in a MD
header file (if not, a value of 0 is assmued).
 1.301  21-Apr-2012  rmind Improve the assert message.
 1.300  18-Apr-2012  yamt comment
 1.299  03-Mar-2012  matt If IPL_SAFEPRI is defined, use it to initialize safepri.
 1.298  19-Feb-2012  rmind Remove COMPAT_SA / KERN_SA. Welcome to 6.99.3!
Approved by core@.
 1.297  28-Jan-2012  rmind branches: 1.297.2;
Remove obsolete ltsleep(9) and wakeup_one(9).
 1.296  06-Nov-2011  dholland branches: 1.296.4;
time_t isn't necessarily "long". PR 45577 from taca@
 1.295  05-Oct-2011  njoly branches: 1.295.2;
Include sys/syslog.h for log(9).
 1.294  05-Oct-2011  apb revert revision 1.291. log(LOG_WARNING) is not strictly more
noisy than printf().
 1.293  05-Oct-2011  apb When killing a process due to RLIMIT_CPU, also log a message
with LOG_NOTICE, and print a message to the user with uprintf.

From PR 45421 by Greg Woods, but I changed the log priority (the user
might think it's an error, but the kernel is just doing its job) and the
wording of the message, and I edited a nearby comment.
 1.292  05-Oct-2011  apb Print "WARNING: negative runtime; monotonic clock has gone backwards\n"
using log(LOG_WARNING, ...), not just printf(...).

From PR 45421 by Greg Woods.
 1.291  27-Sep-2011  jym Modify *ASSERTMSG() so they are now used as variadic macros. The main goal
is to provide routines that do as KASSERT(9) says: append a message
to the panic format string when the assertion triggers, with optional
arguments.

Fix call sites to reflect the new definition.

Discussed on tech-kern@. See
http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html
 1.290  30-Jul-2011  christos Add an implementation of passive serialization as described in expired
US patent 4809168. This is a reader / writer synchronization mechanism,
designed for lock-less read operations.
 1.289  13-May-2011  rmind Sprinkle __cacheline_aligned and __read_mostly.
 1.288  02-May-2011  rmind Extend PCU:
- Add pcu_ops_t::pcu_state_release() operation for PCU_RELEASE case.
- Add pcu_switchpoint() to perform release operation on context switch.
- Sprinkle const, misc. Also, sync MIPS with changes.

Per discussions with matt@.
 1.287  14-Apr-2011  matt Add an assert to make sure no unexpected spinlocks are held in mi_switch
 1.286  03-Jan-2011  pooka branches: 1.286.2;
update comment
 1.285  18-Dec-2010  rmind mi_switch: remove invalid assert and add a note that preemption/interrupt
may happen while migrating LWP is set.

Reported by Manuel Bouyer.
 1.284  02-Nov-2010  pooka KASSERT we don't kpause indefinitely without interruptability.

XXX: using timo == 0 to mean "sleep as long as you like, and forever
if you're really tired" is not the smartest interface considering
the the hz/n idiom used to specify timo. This leads to unwanted
behaviour when hz gets below some impossible-to-know limit. With
a usec2ticks() routine it at least be a little more tolerable.
 1.283  30-Apr-2010  martin Add a CTASSERT to make sure the cexp and ldavg arrays are kept in sync
 1.282  20-Apr-2010  rmind sched_pstats: fix previous, exclude system/softintr threads from loadavg.
 1.281  16-Apr-2010  rmind - Merge sched_pstats() and uvm_meter()/uvm_loadav(). Avoids double loop
through all LWPs and duplicate locking overhead.

- Move sched_pstats() from soft-interrupt context to process 0 main loop.
Avoids blocking effect on real-time threads. Mostly fixes PR/38792.

Note: it might be worth to move the loop above PRI_PGDAEMON. Also,
sched_pstats() might be cleaned-up slightly.
 1.280  03-Mar-2010  yamt branches: 1.280.2;
remove redundant checks of PK_MARKER.
 1.279  23-Feb-2010  darran DTrace: Get rid of the KDTRACE_HOOKS ifdefs in the kernel. Replace the
functions with inline function that are empty when KDTRACE_HOOKS is not
defined.
 1.278  21-Feb-2010  darran DTrace: Add __predict_false() to the DTrace hooks per rmind's suggestion.
 1.277  21-Feb-2010  darran Added a defflag option for KDTRACE_HOOKS and included opt_dtrace.h in the
relevant files. (Per Quentin Garnier - thanks!).
 1.276  21-Feb-2010  darran Add the DTrace hooks to the kernel (KDTRACE_HOOKS config option).
DTrace adds a pointer to the lwp and proc structures which it uses to
manage its state. These are opaque from the kernel perspective to keep
the kernel free of CDDL code. The state arenas are kmem_alloced and freed
as proccesses and threads are created and destoyed.

Also add a check for trap06 (privileged/illegal instruction) so that
DTrace can check for D scripts that may have triggered the trap so it
can clean up after them and resume normal operation.

Ok with core@.
 1.275  18-Feb-2010  skrll Fix comment(s).

OK'ed by rmind
 1.274  30-Dec-2009  rmind branches: 1.274.2;
- nextlwp: do not set l_cpu, it should be returned correct (add assert).
- resched_cpu: avoid double set of ci.
 1.273  05-Dec-2009  pooka tsleep() on lbolt is now illegal. Convert cv_wakeup(&lbolt) to
cv_broadcast(&lbolt) and get rid of the prior.
 1.272  05-Dec-2009  pooka Convert tsleep(&lbolt) to kpause(). Make ltsleep/mtsleep on lbolt
illegal. I examined all places where lbolt is referenced to make
sure there were pointer aliases of it passed to tsleep, but put a
KASSERT in m/ltsleep() just to be sure.
 1.271  21-Oct-2009  rmind Remove uarea swap-out functionality:

- Addresses the issue described in PR/38828.
- Some simplification in threading and sleepq subsystems.
- Eliminates pmap_collect() and, as a side note, allows pmap optimisations.
- Eliminates XS_CTL_DATA_ONSTACK in scsipi code.
- Avoids few scans on LWP list and thus potentially long holds of proc_lock.
- Cuts ~1.5k lines of code. Reduces amd64 kernel size by ~4k.
- Removes __SWAP_BROKEN cases.

Tested on x86, mips, acorn32 (thanks <mpumford>) and partly tested on
acorn26 (thanks to <bjh21>).

Discussed on <tech-kern>, reviewed by <ad>.
 1.270  03-Oct-2009  elad - Move sched_listener and co. from kern_synch.c to sys_sched.c, where it
really belongs (suggested by rmind@),

- Rename sched_init() to synch_init(), and introduce a new sched_init()
in sys_sched.c where we (a) initialize the sysctl node (no more
link-set) and (b) listen on the process scope with sched_listener.

Reviewed by and okay rmind@.
 1.269  03-Oct-2009  elad Oops, forgot to make sched_listener static. Pointed out by rmind@, thansk!
 1.268  03-Oct-2009  elad Move sched policy back to the subsystem.
 1.267  19-Jul-2009  yamt set LP_RUNNING when starting lwp0 and idle lwps.
add assertions.
 1.266  29-Jun-2009  yamt update a comment
 1.265  28-Jun-2009  rmind Ephemeral mapping (emap) implementation. Concept is based on the idea that
activity of other threads will perform the TLB flush for the processes using
emap as a side effect. To track that, global and per-CPU generation numbers
are used. This idea was suggested by Andrew Doran; various improvements to
it by me. Notes:

- For now, zero-copy on pipe is not yet enabled.
- TCP socket code would likely need more work.
- Additional UVM loaning improvements are needed.

Proposed on <tech-kern>, silence there.
Quickly reviewed by <ad>.
 1.264  16-Apr-2009  ad kpreempt: fix another bug, uintptr_t -> bool truncation.
 1.263  16-Apr-2009  rmind Avoid few #ifdef KSTACK_CHECK_MAGIC.
 1.262  15-Apr-2009  yamt kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.
 1.261  28-Mar-2009  rmind - kpreempt_disabled: constify l.
- Few predictions.
- KNF.
 1.260  04-Feb-2009  ad branches: 1.260.2;
Warn once and no more about backwards monotonic clock.
 1.259  28-Jan-2009  rmind sched_pstats: add few checks to catch the problem. OK by <ad>.
 1.258  21-Dec-2008  ad Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.
 1.257  20-Dec-2008  ad Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.
 1.256  13-Dec-2008  ad PR kern/36183 problem with ptrace and multithreaded processes

Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.
 1.255  15-Nov-2008  skrll s/process/LWP/ in comments where appropriate.
 1.254  29-Oct-2008  smb branches: 1.254.2;
Fix a type -- a comment started with /m instead of /* ....
 1.253  29-Oct-2008  skrll Typo in comment.
 1.252  15-Oct-2008  wrstuden branches: 1.252.2;
Merge wrstuden-revivesa into HEAD.
 1.251  25-Jul-2008  uwe Declare lwp_exit_switchaway() __dead. Add infinite loop at the end of
lwp_exit_switchaway() to convince gcc that cpu_switchto(NULL, ...) is
really not going to return in that case. Exposed by gcc4.3.

Reported on tech-kern by Alexander Shishkin.
 1.250  02-Jul-2008  rmind branches: 1.250.2;
Remove outdated comments, and historical CCPU_SHIFT. Make resched_cpu static,
const-ify ccpu. Note: resched_cpu is not correct, should be revisited.

OK by <ad>.
 1.249  02-Jul-2008  rmind Remove locking of p_stmutex from sched_pstats(), protect l_pctcpu with p_lock,
and make l_cpticks lock-less. Should fix PR/38296.

Reviewed (slightly different version) by <ad>.
 1.248  31-May-2008  ad branches: 1.248.2;
PR kern/38812 race between lwp_exit_switchaway and exit1/coredump

Move the LWP RUNNING and TIMEINTR flags into the thread-private flag word.
 1.247  29-May-2008  ad lwp_exit_switchaway: set l_lwpctl->lc_curcpu = EXITED, not NONE.
 1.246  29-May-2008  rmind Simplifcation for running LWP migration. Removes double-locking in
mi_switch(), migration for LSONPROC is now performed via idle loop.
Handles/fixes on-CPU case in lwp_migrate(), misc.

Closes PR/38169, idea of migration via idle loop by Andrew Doran.
 1.245  27-May-2008  ad Move lwp_exit_switchaway() into kern_synch.c. Instead of always switching
to the idle loop, pick a new LWP from the run queue.
 1.244  26-May-2008  ad Take the mutex pointer and waiters count out of sleepq_t: the values can
be or are maintained elsewhere. Now a sleepq_t is just a TAILQ_HEAD.
 1.243  19-May-2008  ad Reduce ifdefs due to MULTIPROCESSOR slightly.
 1.242  19-May-2008  rmind - Make periodical balancing mandatory.
- Fix priority raising in M2 (broken after making runqueues mandatory).
 1.241  30-Apr-2008  ad branches: 1.241.2;
Avoid unneeded AST faults.
 1.240  30-Apr-2008  ad kpreempt: fix a block that should only have compiled as C++... I gues
there is a parsing bug in gcc that let it through.
 1.239  30-Apr-2008  ad Reapply 1.235 which was lost with a subsequent merge.
 1.238  29-Apr-2008  ad Ignore processes with PK_MARKER set.
 1.237  29-Apr-2008  rmind Split the runqueue management code into the separate file.
OK by <ad>.
 1.236  29-Apr-2008  ad Suspended LWPs are no longer created with l_mutex == spc_mutex. Remove
workaround in setrunnable. Fixes PR kern/38222.
 1.235  28-Apr-2008  ad EVCNT_TYPE_INTR -> EVCNT_TYPE_MISC
 1.234  28-Apr-2008  ad Make the preemption switch a __HAVE instead of an option.
 1.233  28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.232  28-Apr-2008  ad Even if PREEMPTION is defined, disable it by default until any preemption
safety issues have been ironed out. Can be enabled at runtime with sysctl.
 1.231  28-Apr-2008  ad Add MI code to support in-kernel preemption. Preemption is deferred by
one of the following:

- Holding kernel_lock (indicating that the code is not MT safe).
- Bracketing critical sections with kpreempt_disable/kpreempt_enable.
- Holding the interrupt priority level above IPL_NONE.

Statistics on kernel preemption are reported via event counters, and
where preemption is deferred for some reason, it's also reported via
lockstat. The LWP priority at which preemption is triggered is tuneable
via sysctl.
 1.230  27-Apr-2008  ad branches: 1.230.2;
- Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable.
DragonflyBSD uses the crit names for something quite different.
- Add a kpreempt_disabled function for diagnostic assertions.
- Add inline versions of kpreempt_enable/kpreempt_disable for primitives.
- Make some more changes for preemption safety to the x86 pmap.
 1.229  24-Apr-2008  ad Merge proc::p_mutex and proc::p_smutex into a single adaptive mutex, since
we no longer need to guard against access from hardware interrupt handlers.

Additionally, if cloning a process with CLONE_SIGHAND, arrange to have the
child process share the parent's lock so that signal state may be kept in
sync. Partially addresses PR kern/37437.
 1.228  24-Apr-2008  ad Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.227  13-Apr-2008  yamt branches: 1.227.2;
sched_print_runqueue: add __printf__ attribute to the 'pr' argument.
 1.226  13-Apr-2008  yamt sched_print_runqueue: fix printf formats.
 1.225  13-Apr-2008  dogcow Since nobody else has fixed it yet: fix case of GDB && !MULTIPROCESSOR.
 1.224  12-Apr-2008  ad Move the LW_BOUND flag into the thread-private flag word. It can be tested
by other threads/CPUs but that is only done when the LWP is known to be in a
quiescent state (for example, on a run queue).
 1.223  12-Apr-2008  ad Take the run queue management code from the M2 scheduler, and make it
mandatory. Remove the 4BSD run queue code. Effects:

- Pluggable scheduler is only responsible for co-ordinating timeshared jobs.
- All systems run with per-CPU run queues.
- 4BSD scheduler gets processor sets / affinity.
- 4BSD scheduler gets a significant peformance boost on some workloads.

Discussed on tech-kern@.
 1.222  02-Apr-2008  ad yield: don't drop priority to zero. libpthread doesn't make much use of
this any more but applications do and it now pessimizes benchmarks.
 1.221  17-Mar-2008  ad Add a boolean parameter to syncobj_t::sobj_unsleep. If true we want the
existing behaviour: the unsleep method unlocks and wakes the swapper if
needs be. If false, the caller is doing a batch operation and will take
care of that later. This is kind of ugly, but it's difficult for the caller
to know which lock to release in some situations.
 1.220  16-Mar-2008  rmind Workaround the case, when l_cpu changes to l_target_cpu, and causes
the locking against oneself. Will be revisited. OK by <ad>.
 1.219  12-Mar-2008  ad Add a preemption counter to lwpctl_t, to allow user threads to detect that
they have been preempted.
 1.218  11-Mar-2008  ad Make context switch + syscall counters optionally per-CPU and accumulate
in schedclock() at "about 16 hz".
 1.217  14-Feb-2008  ad branches: 1.217.2; 1.217.6;
Make schedstate_percpu::spc_lwplock an exernally allocated item. Remove
the hacks in sparc/cpu.c to reinitialize it. This should be in its own
cache line but that's another change.
 1.216  15-Jan-2008  rmind Implementation of processor-sets, affinity and POSIX real-time extensions.
Add schedctl(8) - a program to control scheduling of processes and threads.

Notes:
- This is supported only by SCHED_M2;
- Migration of LWP mechanism will be revisited;

Proposed on: <tech-kern>. Reviewed by: <ad>.
 1.215  04-Jan-2008  ad Start detangling lock.h from intr.h. This is likely to cause short term
breakage, but the mess of dependencies has been regularly breaking the
build recently anyhow.
 1.214  02-Jan-2008  ad Merge vmlocking2 to head.
 1.213  27-Dec-2007  ad sched_pstats: need proclist_mutex to send signals.
 1.212  22-Dec-2007  yamt use binuptime for l_stime/l_rtime.
 1.211  03-Dec-2007  ad branches: 1.211.2; 1.211.6;
Soft interrupts can now take proclist_lock, so there is no need to
double-lock alllwp or allproc.
 1.210  03-Dec-2007  ad For the slow path soft interrupts, arrange to have the priority of a
borrowed user LWP raised into the 'kernel RT' range if the LWP sleeps
(which is unlikely).
 1.209  02-Dec-2007  ad - mi_switch: adjust so that we don't have to hold the old LWP locked across
context switch, since cpu_switchto() can be slow under certain conditions.
From rmind@ with adjustments by me.
- lwpctl: allow LWPs to reregister instead of returning EINVAL. Just return
their existing lwpctl user address.
 1.208  29-Nov-2007  ad cv_init(&lbolt, "lbolt");
 1.207  12-Nov-2007  ad Add _lwp_ctl() system call: provides a bidirectional, per-LWP communication
area between processes and the kernel.
 1.206  10-Nov-2007  ad Put back equivalent change to rev 1.189 which was lost:

setrunnable: adjust to slightly different locking strategy post
yamt-idlewlp. Should fix kern/36398. Untested due to connectivity issues.
 1.205  06-Nov-2007  ad Fix merge error. Spotted by rmind@.
 1.204  06-Nov-2007  ad Merge scheduler changes from the vmlocking branch. All discussed on
tech-kern:

- Invert priority space so that zero is the lowest priority. Rearrange
number and type of priority levels into bands. Add new bands like
'kernel real time'.
- Ignore the priority level passed to tsleep. Compute priority for
sleep dynamically.
- For SCHED_4BSD, make priority adjustment per-LWP, not per-process.
 1.203  04-Nov-2007  rmind branches: 1.203.2;
- Migrate all threads when the state of CPU is changed to offline;
- Fix inverted logic with r_mcount in M2;
- setrunnable: perform sched_takecpu() when making the LWP runnable;
- setrunnable: l_mutex cannot be spc_mutex here;

This makes cpuctl(8) work with SCHED_M2.

OK by <ad>.
 1.202  29-Oct-2007  yamt reduce dependencies on opt_sched.h.
 1.201  13-Oct-2007  rmind branches: 1.201.2;
- Fix a comment: LSIDL is covered by spc_mutex, not spc_lwplock.
- mi_switch: Add a comment that spc_lwplock might not necessary be held.
 1.200  09-Oct-2007  rmind Import of SCHED_M2 - the implementation of new scheduler, which is based
on the original approach of SVR4 with some inspirations about balancing
and migration from Solaris. It implements per-CPU runqueues, provides a
real-time (RT) and time-sharing (TS) queues, ready to support a POSIX
real-time extensions, and also prepared for the support of CPU affinity.

The following lines in the kernel config enables the SCHED_M2:

no options SCHED_4BSD
options SCHED_M2

The scheduler seems to be stable. Further work will come soon.

http://mail-index.netbsd.org/tech-kern/2007/10/04/0001.html
http://www.netbsd.org/~rmind/m2/mysql_bench_ro_4x_local.png
Thanks <ad> for the benchmarks!
 1.199  08-Oct-2007  ad Merge run time accounting changes from the vmlocking branch. These make
the LWP "start time" per-thread instead of per-CPU.
 1.198  03-Oct-2007  ad - sched_yield: When yielding, drop the priority to MAXPRI ensuring that the
calling thread really does yield. The scheduler will adjust it back to a
reasonable level within 1 second. This contradicts POSIX, which specifies
that sched_yield() put the thread onto the back of its current runqueue.
However, POSIX doesn't really have any business specifying what should
happen for SCHED_OTHER (i.e. a timesharing scheduler like ours), and
Java, MySQL and libpthread rely on sched_yield() doing something useful.

- mi_switch: adjust spc_curpriority and newl->l_priority if we avoided
the runqueues and are doing a direct switch. Since nothing currently
does that, there should be no functional change.
 1.197  02-Oct-2007  ad Fix assertion that broke debug kernels.
 1.196  01-Oct-2007  ad Enter mi_switch() from the idle loop if ci_want_resched is set. If there
are no jobs to run it will clear it while under lock. Should fix idle.
 1.195  25-Sep-2007  ad curlwp appears to be set by all active copies of cpu_switchto - remove
the MI assignments and assert that it's set in mi_switch().
 1.194  06-Aug-2007  yamt branches: 1.194.2; 1.194.4; 1.194.6;
suspendsched: reduce #ifdef.
 1.193  04-Aug-2007  ad Add cpuctl(8). For now this is not much more than a toy for debugging and
benchmarking that allows taking CPUs online/offline.
 1.192  02-Aug-2007  rmind branches: 1.192.2;
sys__lwp_suspend: implement waiting for target LWP status changes (or
process exiting). Removes XXXLWP.

Reviewed by <ad> some time ago..
 1.191  01-Aug-2007  ad Ressurect cv_wakeup() and use it on lbolt. Should fix PR kern/36714.
(background/foreground signal lossage in -current with various programs).
 1.190  09-Jul-2007  ad branches: 1.190.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.189  31-May-2007  ad setrunnable: adjust to slightly different locking strategy post yamt-idlewlp.
Should fix kern/36398. Untested due to connectivity issues.
 1.188  17-May-2007  yamt merge yamt-idlelwp branch. asked by core@. some ports still needs work.

from doc/BRANCHES:

idle lwp, and some changes depending on it.

1. separate context switching and thread scheduling.
(cf. gmcgarry_ctxsw)
2. implement idle lwp.
3. clean up related MD/MI interfaces.
4. make scheduler(s) modular.
 1.187  11-Mar-2007  ad branches: 1.187.2;
Put back mtsleep() temporarily. Converting everything over to condvars
at once will take too much time..
 1.186  04-Mar-2007  christos branches: 1.186.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.185  27-Feb-2007  yamt typedef pri_t and use it instead of int and u_char.
 1.184  26-Feb-2007  yamt implement priority inheritance.
 1.183  23-Feb-2007  ad setrunnable(): don't require that sleeps be interruptable. This breaks
smbfs. Fixes PR/35787.
 1.182  21-Feb-2007  thorpej Pick up some additional files that were missed before due to conflicts
with newlock2 merge:

Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.181  19-Feb-2007  dsl Revert 'optimisation' added in rev 1.179.
On i386 (at least) gcc manages two generate two forwards branches which are not
usually taken for the old code, and one forwards branch that is usually taken
for my 'improved version'. Since (IIRC) both athlon and P4 will predict
forwards branches 'not taken' the old code is likely to be faster :-(
Faster variants exist, especially ones using the cmov instruction.
 1.180  18-Feb-2007  dsl Add code to support per-system call statistics:
option SYSCALL_STATS counts the number of times each system call is made
option SYSCALL_TIMES counts the amount of time spent in each system call
Currently the counting hooks have only been added to the i386 system call
handler, and the time spent in interrupts is not subtracted.
It ought also be possible to add the times to the processes profiling
counters in order to get a more accurate user/system/interrupt split.
The counts themselves are readable via the sysctl interface.
 1.179  18-Feb-2007  dsl Optimise canonicalisation of l_rtime for the case when the start and stop
times are in the same second.
 1.178  17-Feb-2007  pavel Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.
 1.177  15-Feb-2007  ad branches: 1.177.2;
Add uvm_kick_scheduler() (MP safe) to replace wakeup(&proc0).
 1.176  10-Feb-2007  yamt remove function prototypes of sa_awaken.
 1.175  10-Feb-2007  christos avoid using struct proc in the perfctrs case, where the variable might
not be used.
 1.174  09-Feb-2007  ad Merge newlock2 to head.
 1.173  03-Nov-2006  ad branches: 1.173.2; 1.173.4;
- ltsleep(): for now, stay at splsched() when releasing sched_lock, or we
may allow wakeup() to occur before switching away. PR/32962.
- mi_switch(): don't inspect p->p_cred or send signals without holding the
kernel lock.
 1.172  02-Nov-2006  yamt ltsleep: fix a race with wakeup().
 1.171  01-Nov-2006  yamt remove some __unused from function parameters.
 1.170  01-Nov-2006  yamt kill signal "dolock" hacks.

related to PR/32962 and PR/34895. reviewed by matthew green.
 1.169  01-Nov-2006  yamt mi_switch: move rlimit and autonice handling out of sched_lock in order to
simplify locking.
related to PR/32962 and PR/34895. reviewed by matthew green.
 1.168  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.167  07-Sep-2006  mrg branches: 1.167.2;
make the bpendtsleep: label only active if KERN_SYNCH_BPENDTSLEEP_LABEL
is defined. if this option is present in the Makefile CFLAGS and we are
using GCC4, build kern_synch.c with -fno-reorder-blocks, so that this
actually works.

XXX be nice if KERN_SYNCH_BPENDTSLEEP_LABEL was a normal 'defflag' option
XXX but for now take the easy way out and make it checkable in CFLAGS.
 1.166  02-Sep-2006  christos branches: 1.166.2;
deal with empty if bodies
 1.165  30-Aug-2006  tsutsui Disable asm statement which defines bpendtsleep symbol as "handy breakpoint"
on all m68k ports since it may cause a multiple symble definition error
by code duplication of gcc4 optimizer. Also note about this in comment.
 1.164  17-Aug-2006  christos Fix all the -D*DEBUG* code that it was rotting away and did not even compile.
Mostly from Arnaud Lacombe, many thanks!
 1.163  08-Jul-2006  matt Don't define bpendtsleep on vax (gcc4 optimizer will duplicate the asm
that contains it result in a multiple symbol definition in gas).
 1.162  24-Jun-2006  mrg don't put the bpendtsleep handy breakpoint in sun2 kernels as the
output asm includes it twice causing multiply-defined symbols.
 1.161  14-May-2006  elad branches: 1.161.4;
integrate kauth.
 1.160  27-Dec-2005  chs branches: 1.160.4; 1.160.6; 1.160.8; 1.160.10; 1.160.12;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.
 1.159  26-Dec-2005  perry u_intN_t -> uintN_t
 1.158  24-Dec-2005  perry Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.157  24-Dec-2005  yamt fix a long-standing scheduler problem that p_estcpu is doubled
for each fork-wait cycles.

- updatepri: factor out the code to decay estcpu so that it can be used
by scheduler_wait_hook.
- scheduler_fork_hook: record how much estcpu is inherited from
the parent process.
- scheduler_wait_hook: don't add back inherited estcpu to the parent.
 1.156  20-Dec-2005  rpaulo Fix comments for preempt() using rev. 1.101.2.31 log of nathanw_sa by thorpej.
 1.155  15-Dec-2005  yamt updatepri:
- don't compare a scaled value with a unscaled value.
- actually, 7 times the loadfactor is necessary to decay p_estcpu enough,
even before the recent p_estcpu changes.
after the recent p_estcpu change, 8 times loadavg decay is needed.
- fix a comment to match with the recent reality.
 1.154  11-Dec-2005  christos merge ktrace-lwp.
 1.153  01-Nov-2005  yamt make scheduler work better when a system has many runnable processes
by making p_estcpu fixpt_t. PR/31542.

1. schedcpu() decreases p_estcpu of all processes
every seconds, by at least 1 regardless of load average.
2. schedclock() increases p_estcpu of curproc by 1,
at about 16 hz.

in the consequence, if a system has >16 processes
with runnable lwps, their p_estcpu are not likely increased.

by making p_estcpu fixpt_t, we can decay it more slowly
when loadavg is high. (ie. solve #1.)

i left kinfo_proc2::p_estcpu (ie. ps -O cpu) scaled because i have
no idea about its absolute value's usage other than debugging,
for which raw values are more valuable.
 1.152  30-Oct-2005  yamt - localize some definitions.
- use PPQ macro where appropriate.
 1.151  06-Oct-2005  yamt branches: 1.151.2;
uninline scheduler hooks.
 1.150  02-Oct-2005  chs avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.

clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.
 1.149  29-May-2005  christos branches: 1.149.2;
- add const.
- remove unnecessary casts.
- add __UNCONST casts and mark them with XXXUNCONST as necessary.
 1.148  02-Mar-2005  mycroft branches: 1.148.2;
Copyright maintenance.
 1.147  26-Feb-2005  perry nuke trailing whitespace
 1.146  09-Dec-2004  matt branches: 1.146.2; 1.146.4;
Add some debug code to validate the runqueues if RQDEBUG is defined.
 1.145  01-Oct-2004  yamt introduce a function, proclist_foreach_call, to iterate all procs on
a proclist and call the specified function for each of them.
primarily to fix a procfs locking problem, but i think that it's useful for
others as well.

while i'm here, introduce PROCLIST_FOREACH macro, which is similar to
LIST_FOREACH but skips marker entries which are used by proclist_foreach_call.
 1.144  18-May-2004  yamt use lockstatus() instead of L_BIGLOCK to check if we're holding a biglock.
fix PR/25595.
 1.143  12-May-2004  yamt use callout_schedule() for schedcpu().
 1.142  14-Mar-2004  cl add kernel part of concurrency support for SA on MP systems
- move per VP data into struct sadata_vp referenced from l->l_savp
* VP id
* lock on VP data
* LWP on VP
* recently blocked LWP on VP
* queue of LWPs woken which ran on this VP before sleep
* faultaddr
* LWP cache for upcalls
* upcall queue
- add current concurrency and requested concurrency variables
- make process exit run LWP on all VPs
- make signal delivery consider all VPs
- make timer events consider all VPs
- add sa_newsavp to allocate new sadata_vp structure
- add sa_increaseconcurrency to prepare new VP
- make sys_sa_setconcurrency request new VP or wakeup idle VP
- make sa_yield lower current concurrency
- set sa_cpu = VP id in upcalls
- maintain cached LWPs per VP
 1.141  13-Feb-2004  wiz Uppercase CPU, plural is CPUs.
 1.140  04-Jan-2004  kleink ; may be a comment character in assembly, use \n as a separator instead.
 1.139  02-Nov-2003  cl Cleanup signal delivery for SA processes:
General idea: only consider the LWP on the VP for signal delivery, all
other LWPs are either asleep or running from waking up until repossessing
the VP.

- in kern_sig.c:kpsignal2: handle all states the LWP on the VP can be in
- in kern_sig.c:proc_stop: only try to stop the LWP on the VP. All other
LWPs will suspend in sa_vp_repossess() until the VP-LWP donates the VP.
Restore original behaviour (before SA-specific hacks were added) for
non-SA processes.
- in kern_sig.c:proc_unstop: only return the LWP on the VP
- handle sa_yield as case 0 in sa_switch instead of clearing L_SA, add an
L_SA_YIELD flag
- replace sa_idle by L_SA_IDLE flag since it was either NULL or == sa_vp

Also don't output itimerfire overrun warning if the process is already
exiting.
Also g/c sa_woken because it's not used.
Also g/c some #if 0 code.
 1.138  26-Oct-2003  fvdl Fix (bogus) unitialized variable warning.
 1.137  08-Sep-2003  itojun truncated output from pty problem. fix by enami
http://mail-index.netbsd.org/tech-kern/2003/09/06/0002.html
 1.136  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.135  28-Jul-2003  matt Improve _lwp_wakeup so when it wakes a thread, the target thread thinks
ltsleep has been interrupted and thus the target will not think it was
a spurious wakeup. (this makes syscalls cancellable for libpthread).
 1.134  18-Jul-2003  matt Add support for storing the priority mask in sched_whichqs in MSB order
(enabled by defining __HAVE_BIGENDIAN_BITOPS in <machine/types.h>). The
default is still LSB ordering. This change will allow the powerpc MD
implementations of setrunqueue/remrunqueue to be nuked.
 1.133  17-Jul-2003  fvdl Changes from Stephan Uphoff to patch problems with LWPs blocking when they
shouldn't, and MP.
 1.132  29-Jun-2003  fvdl branches: 1.132.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.131  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.130  26-Jun-2003  nathanw Whitespace police.
 1.129  26-Jun-2003  nathanw For now, disable voluntary mid-operation preempt() for SA processes;
it doesn't interact well with SA's idea of what's running.
 1.128  20-May-2003  simonb Sprinkle a little white-space.
 1.127  08-May-2003  matt In setrunnable, give more infomrmation in the panic message so we can
figure out WTF went wrong.
 1.126  04-Feb-2003  pk ltsleep(): deal with PNOEXITERR after re-taking the interlock (if necessary).
 1.125  04-Feb-2003  yamt constify wait channels of ltsleep/wakeup. they are never dereferenced.
 1.124  22-Jan-2003  yamt make KSTACK_CHECK_* compile after sa merge.
 1.123  21-Jan-2003  christos step 4: don't de-reference l, if you are going to test if it is NULL a couple
of lines below.
 1.122  18-Jan-2003  thorpej Merge the nathanw_sa branch.
 1.121  15-Jan-2003  thorpej Pass the process priority we want to compare to resched_proc(). Restores
resetpriority() behavior. Thanks to Enami Tsugutomo for pointing out my
mistake.
 1.120  12-Jan-2003  pk schedcpu(): after updating the process CPU tick counters, we no longer need
to run at splstatclock(); continue at splsched().
 1.119  29-Dec-2002  thorpej * Move the resched check from setrunnable() and resetpriority() to
a new inline, resched_proc().
* When performing the resched check, check the priority against the
current priority on the CPU the process last ran on, not always the
current CPU.
 1.118  29-Dec-2002  thorpej Add a comment about affinity to awaken().
 1.117  21-Dec-2002  gmcgarry Re-add yield(). Only used by compat code at the moment.
 1.116  20-Dec-2002  gmcgarry Remove yield() until the scheduler supports the sched_yield(2) system
call.
 1.115  03-Nov-2002  nisimura branches: 1.115.4;
Add some informative comments about setrunqueue and remrunqueue.
 1.114  29-Sep-2002  gmcgarry Back out __HAVE_CHOOSEPROC stuff.
 1.113  22-Sep-2002  gmcgarry Separate the scheduler from the context switching code.

This is done by adding an extra argument to mi_switch() and
cpu_switch() which specifies the new process. If NULL is passed,
then the new function chooseproc() is invoked to wait for a new
process to appear on the run queue.

Also provides an opportunity for optimisations if "switching to self".

Also added are C versions of the setrunqueue() and remrunqueue()
low-level primitives if __HAVE_MD_RUNQUEUE is not defined by MD code.

All these changes are contingent upon the __HAVE_CHOOSEPROC flag being
defined by MD code to indicate that cpu_switch() supports the changes.
 1.112  04-Sep-2002  matt Use the queue macros from <sys/queue.h> instead of referring to the queue
members directly. Use *_FOREACH whenever possible.
 1.111  07-Aug-2002  briggs Only include sys/pmc.h if PERFCTRS is defined.
 1.110  07-Aug-2002  briggs Implement pmc(9) -- An interface to hardware performance monitoring
counters. These counters do not exist on all CPUs, but where they
do exist, can be used for counting events such as dcache misses that
would otherwise be difficult or impossible to instrument by code
inspection or hardware simulation.

pmc(9) is meant to be a general interface. Initially, the Intel XScale
counters are the only ones supported.
 1.109  02-Jul-2002  yamt add KSTACK_CHECK_MAGIC. discussed on tech-kern.
 1.108  21-May-2002  thorpej Move kernel_lock manipulation info functions so that they will
show up in a profile.
 1.107  30-Nov-2001  kleink branches: 1.107.4; 1.107.8;
asm -> __asm.
 1.106  12-Nov-2001  lukem add RCSIDs
 1.105  25-Sep-2001  chs branches: 1.105.2;
in ltsleep(), assert that the interlock is held (if one is given).
 1.104  28-May-2001  chs branches: 1.104.2; 1.104.4;
don't define bpendtsleep in profiling kernels since it confuses gprof.
 1.103  27-Apr-2001  jdolecek Slighly improve comment for ltsleep(), the previous formulation might
be understood incorrectly (at least, it confused me at first, before
I looked at the actual code).
 1.102  20-Apr-2001  thorpej Make sure there is there is a curproc in ltsleep().
 1.101  14-Jan-2001  thorpej branches: 1.101.2;
Whenever ps_sigcheck is set to true, signotify() the process, and
wrap this all up in a CHECKSIGS() macro. Also, in psignal1(),
signotify() SRUN and SIDL processes if __HAVE_AST_PERPROC is defined.

Per discussion w/ mycroft.
 1.100  01-Jan-2001  sommerfeld MULTIPROCESSOR: The two calls to psignal() inside mi_switch() are
inside the scheduler lock perimeter and should be sched_psignal() instead.
 1.99  22-Dec-2000  jdolecek split off thread specific stuff from struct sigacts to struct sigctx, leaving
only signal handler array sharable between threads
move other random signal stuff from struct proc to struct sigctx

This addresses kern/10981 by Matthew Orgass.
 1.98  12-Nov-2000  jdolecek use SIGACTION() macro to get on appropriate sigaction
structure
 1.97  23-Sep-2000  enami Stop runnable but swapped out user processes also in suspendsched().
 1.96  15-Sep-2000  enami The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.
 1.95  14-Sep-2000  thorpej Make sure to lock the proclist when we're traversing allproc.
 1.94  05-Sep-2000  bouyer Implement suspendsched() by putting all sleeping and runnable processes
in SSTOP state, execpt P_SYSTEM and curproc processes. We have to way to
find the original state of the process so we can't restart scheduling,
so this can only be used at shutdown time.

XXX suspendsched() should also deal with processes running on other CPUs.
I don't know how to do that, and as long as we have a kernel big lock,
this shouldn't be a problem.
 1.93  05-Sep-2000  bouyer Back out the suspendsched()/resumesched() thing, per request of Jason Thorpe &
Bill Sommerfeld. suspendsched() will be implemented in a different way.
 1.92  01-Sep-2000  bouyer wakeup()->sched_wakeup()
 1.91  31-Aug-2000  bouyer Add the sched_suspend/sched_resume functions, as discussed on tech-kern,
with the following modifications to the initial patch:
- rename SHOLD and P_HOST to SSUSPEND and P_SUSPEND to avoid confusion with
PHOLD()
- don't deal with SSUSPEND/P_SUSPEND in fork1(), if we come here while
scheduler is suspended we're forking proc0, which can't have P_SUSPEND set.

sched_suspend() suspends the scheduling of users process, by removing all
processes from the run queues and changing their state from SRUN to
SSUSPEND. Also mark all user process but curproc P_SUSPEND.
When a process has to be put in SRUN and is marked P_SUSPEND, it's placed in
the SSUSPEND state instead.
sched_resume() places all SSUSPEND processes back in SRUN, clear the P_SUSPEND
flag.
 1.90  26-Aug-2000  sommerfeld Since the spinlock count is per-cpu, we don't need atomic operations
to update it, so don't bother with <machine/atomic.h>

Flush kernel_lock_release_all() and kernel_lock_acquire_count() (which
didn't do spinlock accounting correctly), and replace them with
spinlock_release_all() and spinlock_acquire_count().
 1.89  26-Aug-2000  sommerfeld On second thought.. pass cpu_info * to roundrobin() explicitly.
 1.88  26-Aug-2000  sommerfeld More MP clock/scheduler changes:
- Periodically invoke roundrobin() from hardclock() on all cpu's rather
than from a timer callout; this allows time-slicing on non-primary cpu's.
- Make pscnt per-cpu.
- Notice psdiv changes on each cpu, and adjust pscnt at that point.
Also, invoke setstatclockrate() from the clock interrupt when each cpu
notices the divisor change, rather than when starting/stopping the
profiling clock.
 1.87  25-Aug-2000  thorpej Make need_resched() take a "struct cpu_info *" argument. This
causes gives a primitive form of processor affinity. Its use in
roundrobin() still needs some work.
 1.86  24-Aug-2000  thorpej Correct a comment.
 1.85  24-Aug-2000  sommerfeld Move kernel_lock release/switch/reacquire from ltsleep() to
mi_switch(), so we don't botch the locking around preempt() or
yield().
 1.84  22-Aug-2000  thorpej Define the MI parts of the "big kernel lock" perimeter. From
Bill Sommerfeld.
 1.83  20-Aug-2000  thorpej Add a lock around the scheduler, and use it as necessary, including
in the non-MULTIPROCESSOR case (LOCKDEBUG requires it). Scheduler
lock is held upon entry to mi_switch() and cpu_switch(), and
cpu_switch() releases the lock before returning.

Largely from Bill Sommerfeld, with some minor bug fixes and
machine-dependent code hacking from me.
 1.82  07-Aug-2000  thorpej Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.
 1.81  07-Aug-2000  thorpej It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.
 1.80  02-Aug-2000  nathanw principal -> principle (in a comment)
 1.79  27-Jun-2000  mrg remove include of <vm/vm.h>
 1.78  10-Jun-2000  sommerfeld branches: 1.78.2;
Fix assorted bugs around shutdown/reboot/panic time.
- add a new global variable, doing_shutdown, which is nonzero if
vfs_shutdown() or panic() have been called.
- in panic, set RB_NOSYNC if doing_shutdown is already set on entry
so we don't reenter vfs_shutdown if we panic'ed there.
- in vfs_shutdown, don't use proc0's process for sys_sync unless
curproc is NULL.
- in lockmgr, attribute successful locks to proc0 if doing_shutdown
&& curproc==NULL, and panic if we can't get the lock right away; avoids the
spurious lockmgr DIAGNOSTIC panic from the ddb reboot command.
- in subr_pool, deal with curproc==NULL in the doing_shutdown case.
- in mfs_strategy, bitbucket writes if doing_shutdown, so we don't
wedge waiting for the mfs process.
- in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the
panicstr case.

Appears to fix: kern/9239, kern/10187, kern/9367.
May also fix kern/10122.
 1.77  08-Jun-2000  thorpej Change tsleep() to ltsleep(), which takes an interlock argument. The
interlock is released once the scheduler is locked, so that a race
between a sleeper and an awakener is prevented in a multiprocessor
environment. Provide a tsleep() macro that provides the old API.
 1.76  31-May-2000  thorpej Track which process a CPU is running/has last run on by adding a
p_cpu member to struct proc. Use this in certain places when
accessing scheduler state, etc. For the single-processor case,
just initialize p_cpu in fork1() to avoid having to set it in the
low-level context switch code on platforms which will never have
multiprocessing.

While I'm here, comment a few places where there are known issues
for the SMP implementation.
 1.75  27-May-2000  thorpej branches: 1.75.2;
All users of the old sleep() are now gone; nuke it.
 1.74  27-May-2000  sommerfeld Reduce use of curproc in several places:

- Change ktrace interface to pass in the current process, rather than
p->p_tracep, since the various ktr* function need curproc anyway.

- Add curproc as a parameter to mi_switch() since all callers had it
handy anyway.

- Add a second proc argument for inferior() since callers all had
curproc handy.

Also, miscellaneous cleanups in ktrace:

- ktrace now always uses file-based, rather than vnode-based I/O
(simplifies, increases type safety); eliminate KTRFLAG_FD & KTRFAC_FD.
Do non-blocking I/O, and yield a finite number of times when receiving
EWOULDBLOCK before giving up.

- move code duplicated between sys_fktrace and sys_ktrace into ktrace_common.

- simplify interface to ktrwrite()
 1.73  26-May-2000  thorpej First sweep at scheduler state cleanup. Collect MI scheduler
state into global and per-CPU scheduler state:

- Global state: sched_qs (run queues), sched_whichqs (bitmap
of non-empty run queues), sched_slpque (sleep queues).
NOTE: These may collectively move into a struct schedstate
at some point in the future.

- Per-CPU state, struct schedstate_percpu: spc_runtime
(time process on this CPU started running), spc_flags
(replaces struct proc's p_schedflags), and
spc_curpriority (usrpri of processes on this CPU).

- Every platform must now supply a struct cpu_info and
a curcpu() macro. Simplify existing cpu_info declarations
where appropriate.

- All references to per-CPU scheduler state now made through
curcpu(). NOTE: this will likely be adjusted in the future
after further changes to struct proc are made.

Tested on i386 and Alpha. Changes are mostly mechanical, but apologies
in advance if it doesn't compile on a particular platform.
 1.72  26-May-2000  thorpej Introduce a new process state distinct from SRUN called SONPROC
which indicates that the process is actually running on a
processor. Test against SONPROC as appropriate rather than
combinations of SRUN and curproc. Update all context switch code
to properly set SONPROC when the process becomes the current
process on the CPU.
 1.71  30-Mar-2000  augustss Get rid of register declarations.
 1.70  28-Mar-2000  simonb endtsleep() is prototyped at the top of the file, delete duplicate
declaration inside tsleep().
 1.69  23-Mar-2000  thorpej Track if a process has been through a round-robin cycle without yielding
the CPU, and mark that it should yield if that happens.

Based on a discussion with Artur Grabowski.
 1.68  23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.67  15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.66  14-Oct-1999  ross branches: 1.66.2; 1.66.4;
Back out a small and unfinished piece of the old scheduler rototill.
 1.65  17-Sep-1999  thorpej branches: 1.65.2;
Centralize the declaration and clearing of `cold'.
 1.64  15-Sep-1999  thorpej Be slightly more informative in the tsleep() diagnostics.
 1.63  26-Jul-1999  thorpej Implement wakeup_one(), which wakes up the highest priority process
first in line for the specified identifier. For use in places where
you don't want a Thundering Herd.

While here, add an optimization to wakeup() suggested by Ross Harvey.
 1.62  25-Jul-1999  thorpej Turn the proclist lock into a read/write spinlock. Update proclist locking
calls to reflect this. Also, block statclock rather than softclock during
in the proclist locking functions, to address a problem reported on
current-users by Sean Doran.
 1.61  22-Jul-1999  thorpej Add a read/write lock to the proclists and PID hash table. Use the
write lock when doing PID allocation, and during the process exit path.
Use a read lock every where else, including within schedcpu() (interrupt
context). Note that holding the write lock implies blocking schedcpu()
from running (blocks softclock).

PID allocation is now MP-safe.

Note this actually fixes a bug on single processor systems that was probably
extremely difficult to tickle; it was possible that schedcpu() would run
off a bad pointer if the right clock interrupt happened to come in the
middle of a LIST_INSERT_HEAD() or LIST_REMOVE() to/from allproc.
 1.60  22-Jul-1999  thorpej Rework the process exit path, in preparation for making process exit
and PID allocation MP-safe. A new process state is added: SDEAD. This
state indicates that a process is dead, but not yet a zombie (has not
yet been processed by the process reaper).

SDEAD processes exist on both the zombproc list (via p_list) and deadproc
(via p_hash; the proc has been removed from the pidhash earlier in the exit
path). When the reaper deals with a process, it changes the state to
SZOMB, so that wait4 can process it.

Add a P_ZOMBIE() macro, which treats a proc in SZOMB or SDEAD as a zombie,
and update various parts of the kernel to reflect the new state.
 1.59  21-Apr-1999  mrg revert previous. oops.
 1.58  21-Apr-1999  mrg properly test the msgsz as "msgsz - len". from PR#7386
 1.57  24-Mar-1999  mrg branches: 1.57.2; 1.57.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.56  28-Feb-1999  ross schedclk() -> schedclock(), for consistency with hardclock(), statclock(), ...
update comments for recent scheduler mods
 1.55  23-Feb-1999  ross Scheduler bug fixes and reorganization
* fix the ancient nice(1) bug, where nice +20 processes incorrectly
steal 10 - 20% of the CPU, (or even more depending on load average)
* provide a new schedclk() mechanism at a new clock at schedhz, so high
platform hz values don't cause nice +0 processes to look like they are
niced
* change the algorithm slightly, and reorganize the code a lot
* fix percent-CPU calculation bugs, and eliminate some no-op code

=== nice bug === Correctly divide the scheduler queues between niced and
compute-bound processes. The current nice weight of two (sort of, see
`algorithm change' below) neatly divides the USRPRI queues in half; this
should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides
being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
and it was done after decay_cpu() which can only _reduce_ the value. It
has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
scheduler-penalize themselves onto the same queue as nice +20 processes.
(Or even a higher one.)

=== New schedclk() mechansism === Some platforms should be cutting down
stathz before hitting the scheduler, since the scheduler algorithm only
works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
back and forth by 4 every time p_estcpu is touched (each occurance an
abstraction violation), use p_estcpu without scaling and require schedhz
to be generated directly at the right frequency. Use a default stathz (well,
actually, profhz) / 4, so nothing changes unless a platform defines schedhz
and a new clock. Define these for alpha, where hz==1024, and nice was
totally broke.

=== Algorithm change === The nice value used to be added to the
exponentially-decayed scheduler history value p_estcpu, in _addition_ to
be incorporated directly (with greater wieght) into the priority calculation.
At first glance, it appears to be a pointless increase of 1/8 the nice
effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
because it will ramp up linearly but be decayed only exponentially, thus
converging to an additional .75 nice for a loadaverage of one. I killed
this, it makes the behavior hard to control, almost impossible to analyze,
and the effect (~~nothing at for the first second, then somewhat increased
niceness after three seconds or more, depending on load average) pointless.

=== Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
Collect scheduler functionality. Try to put each abstraction in just one
place.
 1.54  04-Nov-1998  chs LOCKDEBUG enhancements for non-MP:
keep a list of locked locks.
use this to print where the lock was locked
when we either go to sleep with a lock held
or try to free a locked lock.
 1.53  11-Sep-1998  mycroft Substantial signal handling changes:
* Increase the size of sigset_t to accomodate 128 signals -- adding new
versions of sys_setprocmask(), sys_sigaction(), sys_sigpending() and
sys_sigsuspend() to handle the changed arguments.
* Abstract the guts of sys_sigaltstack(), sys_setprocmask(), sys_sigaction(),
sys_sigpending() and sys_sigsuspend() into separate functions, and call them
from all the emulations rather than hard-coding everything. (Avoids uses
the stackgap crap for these system calls.)
* Add a new flag (p_checksig) to indicate that a process may have signals
pending and userret() needs to do the full (slow) check.
* Eliminate SAS_ALTSTACK; it's exactly the inverse of SS_DISABLE.
* Correct emulation bugs with restoring SS_ONSTACK.
* Make the signal mask in the sigcontext always use the emulated mask format.
* Store signals internally in sigaction structures, rather than maintaining a
bunch of little sigsets for each SA_* bit.
* Keep track of where we put the signal trampoline, rather than figuring it out
in *_sendsig().
* Issue a warning when a non-emulated sigaction bit is observed.
* Add missing emulated signals, and a native SIGPWR (currently not used).
* Implement the `not reset when caught' semantics for relevant signals.

Note: Only code touched by the i386 port has been modified. Other ports and
emulations need to be updated.
 1.52  04-Jul-1998  jonathan defopt DDB.
 1.51  25-Jun-1998  thorpej defopt KTRACE
 1.50  01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.49  12-Feb-1998  kleink Fix variable declarations: register -> register int.
 1.48  10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.47  05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.46  10-Oct-1997  mycroft GC pageproc and bclnlist.
 1.45  09-Oct-1997  mycroft Make wmesg arguments to various functions const.
 1.44  07-May-1997  gwr branches: 1.44.4; 1.44.6;
Moved db_show_all_procs() to kern_proc.c
 1.43  06-Nov-1996  cgd Fix an inconsistency that came in with Lite: setrq() was renamed to
setrunqueue(), but remrq() was never renamed. Rename remrq() to
remrunqueue().
 1.42  15-Oct-1996  cgd reorganize tsleep() so the (cold || panicstr) test is done before the
ktrace context switch checking. If syncing disks while handling a panic
that occurred while 'curproc' was NULL, the old code would dereference
NULL and die. The (slight) reorganization was done so that space (one extra
splhigh()), rather than time (one extra comparison), would be wasted.
 1.41  13-Oct-1996  christos backout previous kprintf change
 1.40  10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.39  02-Oct-1996  ws Fix p_nice vs. NZERO code.
Change NZERO to 20 to always make p_nice positive.
On Christos' suggestion make p_nice explicitly u_char.
 1.38  17-Jul-1996  explorer Add compile-time and run-time control over automatic niceing
 1.37  22-Apr-1996  christos branches: 1.37.4;
remove include of <sys/cpu.h>
 1.36  30-Mar-1996  christos Fix db_printf formats.
 1.35  09-Feb-1996  christos More proto fixes
 1.34  04-Feb-1996  christos First pass at prototyping
 1.33  08-Jun-1995  mycroft Fix various signal handling bugs:
* If we got a stopping signal while already stopped with the same signal,
the second signal would sometimes (but not always) be ignored.
* Signals delivered by the debugger always pretended to be stopping
signals.
* PT_ATTACH still didn't quite work right.
 1.32  22-Apr-1995  christos - new copyargs routine.
- use emul_xxx
- deprecate nsysent; use constant SYS_MAXSYSCALL instead.
- deprecate ep_setup
- call sendsig and setregs indirectly.
 1.31  19-Mar-1995  mycroft Use %p.
 1.30  30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.29  30-Aug-1994  mycroft Display emulation type.
 1.28  30-Aug-1994  mycroft Clean up some debugging code.
 1.27  30-Aug-1994  mycroft Convert process, file, and namei lists and hash tables to use queue.h.
 1.26  29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.25  18-May-1994  cgd mostly-machine-indepedent switch, and changes to match. also, hack init_main
 1.24  14-May-1994  glass missing rcsid
 1.23  13-May-1994  cgd setrq -> setrunqueue, sched -> scheduler
 1.22  07-May-1994  cgd function name changes
 1.21  06-May-1994  mycroft Put some more code in splstatclock(), just to be safe.
 1.20  05-May-1994  mycroft Now setpri() is really toast.
 1.19  05-May-1994  mycroft setpri() is toast.
 1.18  05-May-1994  mycroft Remove now-bogus casts.
 1.17  05-May-1994  cgd lots of changes: prototype migration, move lots of variables, definitions,
and structure elements around. kill some unnecessary type and macro
definitions. standardize clock handling. More changes than you'd want.
 1.16  04-May-1994  cgd Rename a lot of process flags.
 1.15  29-Apr-1994  cgd change timeout/untimeout/wakeup/sleep/tsleep args to void *
 1.14  22-Dec-1993  cgd cast to match header (changed back...)
 1.13  20-Dec-1993  cgd load average changes from magnum
 1.12  18-Dec-1993  mycroft Canonicalize all #includes.
 1.11  15-Sep-1993  cgd make allproc be volatile, and cast things accordingly.
suggested by torek, because CSRG had problems with reordering
of assignments to allproc leading to strange panics from kernels
compiled with gcc2...
 1.10  29-Aug-1993  cgd branches: 1.10.2;
print more DIAGNOSITC info, and startrtclock early on the mac (like i386)
 1.9  15-Jul-1993  brezak Add 'ps' command. Add -more- pager to output from Mach ddb.
 1.8  27-Jun-1993  andrew #endif was somehow missing from the end of a DDB conditional!
 1.7  27-Jun-1993  andrew ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.
 1.6  27-Jun-1993  glass another NDDB -> DDB change. why did DDB invade kern/*?
 1.5  20-May-1993  cgd add $Id$ strings, and clean up file headers where necessary
 1.4  15-Apr-1993  glass i hate NDDB......
 1.3  10-Apr-1993  glass fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)
 1.2  21-Mar-1993  cgd after 0.2.2 "stable" patches applied
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.10.2.5  27-Nov-1993  mycroft sleep() is now deprecated.
 1.10.2.4  14-Nov-1993  mycroft Canonicalize all #includes.
 1.10.2.3  28-Sep-1993  deraadt define some things from sys/kernel.h
 1.10.2.2  24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
init_main.c: New method of pseudo-device of initialization.
kern_clock.c: hardclock() and softclock() now take a pointer to a clockframe.
softclock() only does callouts.
kern_synch.c: Remove spurious declaration of endtsleep(). Adjust uses of
averunnable for new struct loadav.
subr_prf.c: Allow printf() formats in panic().
tty.c: averunnable changes.
vfs_subr.c: va_size and va_bytes are now quads.
 1.10.2.1  14-Sep-1993  mycroft init_main.c: clock changes from 4.4; initclocks() is called after vfsinit().
No startrtclock() or enablertclock(). Some pseudo-device cruft, but this needs
to be updated.
kern_clock.c: from 4.4: gatherstats() --> statclock(). statclock(),
hardclock(), and softclock() take a `struct clockframe *'. New initclocks(),
harclock(), statclock(), startprofclock(), and stopprofclock().
kern_synch.c: from 4.4: machine-independent swtch(), which is now where
process time is integrated. Calls cpu_swtch() with the current process as an
arg.
subr_autoconf.c: Fix typo.
subr_prf.c: msgbufp and msgbufmapped are define in machdep.c
tty.c: Make TIOCHPCL #ifdef COMPAT_43.
Incorporate changes from main branch.
 1.37.4.1  11-Dec-1996  mycroft From trunk:
Rearrange tsleep() so that a panic while `curproc' is null can successfully
sync.
 1.44.6.1  08-Sep-1997  thorpej Significantly restructure the way signal state for a process is stored.
Rather than using bitmasks to redundantly store the information kept
in the process's sigacts (because the sigacts was kept in the u-area),
hang sigacts directly off the process, and access it directly.

Simplify signal setup code tremendously by storing information in
the sigacts as an array of struct sigactions, rather than in a different
format, since userspace uses sigactions.

Make sigacts sharable by adding reference counting.
 1.44.4.1  14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.57.4.2  02-Aug-1999  thorpej Update from trunk.
 1.57.4.1  21-Jun-1999  thorpej Sync w/ -current.
 1.57.2.3  30-Apr-2000  he Modify/re-do last pullup (via patch, requested by sommerfeld):
Fix two bugs:
o A malicious or erroneous program can hog the CPU in uiomove()
o A ktrace of such a program can hog large amounts of kernel memory
This version of the fix does not increase the size of struct proc
compared to 1.4.2.
 1.57.2.2  30-Apr-2000  he Pull up revision 1.69 (via patch, requested by sommerfeld):
Fix two bugs:
o A malicious or erroneous program can hog the CPU in uiomove()
o A ktrace of such a program can hog large amounts of kernel memory
This increses the size of struct proc, so kernel-grovellers need
rebuild after this.
 1.57.2.1  17-Oct-1999  cgd pull up rev 1.66 from trunk (requested by ross):
Correct use of `wrong' clock for %cpu calculation. (This is not
a functional change at present because all clock frequencies are
the same.)
 1.65.2.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.66.4.1  19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.66.2.5  23-Apr-2001  bouyer Sync with HEAD.
 1.66.2.4  18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.66.2.3  05-Jan-2001  bouyer Sync with HEAD
 1.66.2.2  22-Nov-2000  bouyer Sync with HEAD.
 1.66.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.75.2.1  22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.78.2.5  26-Sep-2000  enami Pullup rev. 1.97 via patch (approved by thorpej@netbsd.org):
Stop runnable but swapped out user processes also in suspendsched().
 1.78.2.4  19-Sep-2000  thorpej Pull up fixes to suspendsched():

revision 1.95
date: 2000/09/14 19:13:29; author: thorpej; state: Exp; lines: +7 -5
Make sure to lock the proclist when we're traversing allproc.

revision 1.96
date: 2000/09/15 06:36:25; author: enami; state: Exp; lines: +4 -4
The struct prochd isn't a proc. Start scaning from prochd.ph_link instead
of &prochd.
 1.78.2.3  06-Sep-2000  bouyer Pull up (approved by thorpej):
sys/proc.h 1.104 -> 1.105
sys/kern/kern_synch.c 1.93 -> 1.94 (via patch to remove SMPism)
sys/kern/vfs_subr.c 1.137 -> 1.138

Add a suspendsched() fuctions, which stops scheduling of users processes
(exept curproc) by putting all SRUN and SSLEEP non-P_SYSTEM processes
in SSTOP state.

In vfs_shutdown() use suspendsched() suspend scheduling, and use tsleep()
instead of DELAY to give kernel threads a chance to run (needed to flush
buffers to a RAID disk). Also, keep trying flushing buffers when the number of
dirty buffers decreases (20 rounds may not be enouth for a very large buffer
cache).
 1.78.2.2  11-Aug-2000  thorpej Pullup from trunk:
Add a DIAGNOSTIC or LOCKDEBUG check for held spin locks.
 1.78.2.1  11-Aug-2000  thorpej Pullup from trunk:
It doesn't make sense to charge simple locks to proc's, because
simple locks are held by CPUs. Remove p_simple_locks (which was
unused anyway, really), and add a LOCKDEBUG check for held simple
locks in mi_switch(). Grow p_locks to an int to take up the space
previously used by p_simple_locks so that the proc structure doens't
change size.
 1.101.2.32  15-Jan-2003  thorpej Sync with HEAD.
 1.101.2.31  07-Jan-2003  thorpej In the SA universe, the switch-to-this-LWP decision is made at a
different level than where preempt() calls are made, which renders
the "newlwp" argument useless. Replace it with a "more work to do"
boolean argument. Returning to userspace preempt() calls pass 0.
"Voluntary" preemptions in e.g. uiomove() pass 1. This will be used
to indicate to the SA subsystem that the LWP is not yet finished in
the kernel.

Collapse the SA vs. non-SA cases of preempt() together, making the
conditional code block much smaller, and don't call sa_preempt() if
more work is to come.

NOTE: THIS IS NOT A COMPLETE FIX TO THE preempt()-in-uiomove() PROBLEM
THAT CURRENTLY EXISTS FOR SA PROCESSES.
 1.101.2.30  06-Jan-2003  nathanw Adjust schedcpu() to make more sense in the face of multi-LWP
processes; iterate primarly over processes, calculate newcpu, and
then iterate over LWPs and calculate priorities. This avoids repeating
the newcpu calculation for each LWP in a process, which would tend to
underestimate the CPU consumed by a multi-LWP process.
 1.101.2.29  31-Dec-2002  thorpej Rename cpu_preempt() to cpu_switchto(), and make the caller remove the
new process from its run queue before calling cpu_switchto().

While here, make a few cpu_switch() and cpu_switchto() implementations
get the outgoing LWP from the args, rather than looking at the curlwp
variable.
 1.101.2.28  29-Dec-2002  thorpej Sync with HEAD.
 1.101.2.27  04-Dec-2002  nathanw In resetprocpriority(), iterate with l_sibling, not l_list, so as to
only iterate over the LWPs in a process, not every entry on the alllwp
list after this process(!).
 1.101.2.26  14-Nov-2002  nathanw wakeup -> sched_wakeup (from trunk, long ago).
 1.101.2.25  11-Nov-2002  nathanw Catch up to -current
 1.101.2.24  18-Oct-2002  nathanw Check for L_BIGLOCK in l->l_flag, rather than P_BIGLOCK in p->p_flag.
 1.101.2.23  18-Oct-2002  nathanw Finish LWPifying C versions of setrunqueue()/remrunqueue().
 1.101.2.22  18-Oct-2002  nathanw Catch up to -current.
 1.101.2.21  17-Sep-2002  nathanw LWPify recent changes.
 1.101.2.20  17-Sep-2002  nathanw Catch up to -current.
 1.101.2.19  30-Aug-2002  nathanw When a SA LWP is woken up, note its identity in the SA data structure.
 1.101.2.18  13-Aug-2002  nathanw Catch up to -current.
 1.101.2.17  01-Aug-2002  nathanw Catch up to -current.
 1.101.2.16  17-Jul-2002  nathanw In preempt(), call sa_preempt() rather than making the upcall directly.
 1.101.2.15  12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.101.2.14  24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.101.2.13  20-Jun-2002  nathanw Catch up to -current.
 1.101.2.12  02-Apr-2002  nathanw - Centralize p_nrlwps handling in those functions which actually
set the LWP state to LSRUN. Simplifies matters considerably.

- Trying to keep track of the preempted LWP was a bad idea; go back
to searching for now.

- Send a PREEMPTED upcall from proc_unstop(), so that stopped processes
know that something happened, and so that all runnable LWPs of a unstopped
process have an upcall to deliver (Ideally, the LWP that was runnable
when the process was stopped should return first, and any LWPs that were
woken up while the process was stopped would interrupt it, but that's
difficult to arrange).
 1.101.2.11  22-Mar-2002  nathanw Move the increment of p->p_nrlwps to before the resume: label, so that
it doesn't get spuriously incremented if the signal checking code
skips the sleep entirely.
 1.101.2.10  08-Jan-2002  nathanw Catch up to -current.
 1.101.2.9  17-Dec-2001  nathanw Split the body of preempt() into SA and non-SA parts. In the SA part, stash
the fact that the current LWP was preempted, for later use by sa_switch().
 1.101.2.8  17-Nov-2001  nathanw Adapt to new sa_upcall() signature.
 1.101.2.7  14-Nov-2001  nathanw Catch up to -current.
 1.101.2.6  26-Sep-2001  nathanw Catch up to -current.
Again.
 1.101.2.5  30-Aug-2001  nathanw Change a few more p->p_cpu's into l->l_cpu.
Missed on i386 because they're inside a machdep macro that doesn't expand
the argyment on i386.
 1.101.2.4  24-Aug-2001  nathanw A few files and lwp/proc conversions I missed in the last big update.
GENERIC runs again.
 1.101.2.3  21-Jun-2001  nathanw Catch up to -current.
 1.101.2.2  08-Apr-2001  nathanw In mi_switch(), pass up the return value from cpu_switch().
In preempt(), make use of the return value of mi_switch() to determine
whether to send a PREEMPTED upcall.
 1.101.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.104.4.1  01-Oct-2001  fvdl Catch up with -current.
 1.104.2.4  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.104.2.3  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.104.2.2  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.104.2.1  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.105.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.107.8.3  29-Aug-2002  gehenna catch up with -current.
 1.107.8.2  15-Jul-2002  gehenna catch up with -current.
 1.107.8.1  30-May-2002  gehenna Catch up with -current.
 1.107.4.1  10-Mar-2002  thorpej First cut implementation of turnstiles, a specialized sleep queue used for
kernel synchronization objects. A detailed description of turnstiles
can be found in:

Solaris Internals: Core Kernel Architecture, by Jim Mauro
and Richard McDougall, section 3.7.

Note this implementation does not yet implement priority inheritence,
nor does it currently differentiate between reader and writer queues
(though they are provided for in the API).
 1.115.4.1  18-Dec-2002  gmcgarry Remove the scheduler semantics from machine-dependent context switch.
 1.132.2.8  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.132.2.7  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.132.2.6  18-Dec-2004  skrll Sync with HEAD.
 1.132.2.5  19-Oct-2004  skrll Sync with HEAD
 1.132.2.4  21-Sep-2004  skrll Fix the sync with head I botched.
 1.132.2.3  18-Sep-2004  skrll Sync with HEAD.
 1.132.2.2  03-Aug-2004  skrll Sync with HEAD
 1.132.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.146.4.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.146.2.1  29-Apr-2005  kent sync with -current
 1.148.2.1  21-Oct-2005  riz Pull up following revision(s) (requested by chs in ticket #901):
sys/kern/kern_time.c: revision 1.94
sys/sys/signalvar.h: revision 1.59
sys/sys/savar.h: revision 1.16
sys/kern/kern_sig.c: revision 1.209
sys/kern/kern_sa.c: revision 1.66
sys/kern/kern_synch.c: revision 1.150
avoid calling into the pool code while holding sched_lock
since both pool_get() and pool_put() can call wakeup().
instead, allocate the struct sadata_upcall before taking
sched_lock in mi_switch() and free it after releasing sched_lock.
clean up some modularity warts by adding a callback to
struct sadata_upcall for freeing sa_arg.
 1.149.2.11  24-Mar-2008  yamt sync with head.
 1.149.2.10  17-Mar-2008  yamt sync with head.
 1.149.2.9  27-Feb-2008  yamt sync with head.
 1.149.2.8  21-Jan-2008  yamt sync with head
 1.149.2.7  07-Dec-2007  yamt sync with head
 1.149.2.6  15-Nov-2007  yamt sync with head.
 1.149.2.5  27-Oct-2007  yamt sync with head.
 1.149.2.4  03-Sep-2007  yamt sync with head.
 1.149.2.3  26-Feb-2007  yamt sync with head.
 1.149.2.2  30-Dec-2006  yamt sync with head.
 1.149.2.1  21-Jun-2006  yamt sync with head.
 1.151.2.1  02-Nov-2005  yamt sync with head.
 1.160.12.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.160.10.2  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.160.10.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.160.8.5  14-Sep-2006  yamt sync with head.
 1.160.8.4  03-Sep-2006  yamt sync with head.
 1.160.8.3  11-Aug-2006  yamt sync with head
 1.160.8.2  26-Jun-2006  yamt sync with head.
 1.160.8.1  24-May-2006  yamt sync with head.
 1.160.6.1  01-Jun-2006  kardel Sync with head.
 1.160.4.1  09-Sep-2006  rpaulo sync with head
 1.161.4.1  13-Jul-2006  gdamore Merge from HEAD.
 1.166.2.20  09-Feb-2007  ad - Change syncobj_t::sobj_changepri() to alter both the user priority and
the effective priority of LWPs. How the effective priority is adjusted
depends on the type of object.
- Add a couple of comments to sched_kpri() and remrunqueue().
 1.166.2.19  05-Feb-2007  ad IPL_STATCLOCK needs to be >= IPL_CLOCK, so assume that proc::p_stmutex is
always a spinlock.
 1.166.2.18  31-Jan-2007  ad - Have callers to mi_switch() drop the kernel lock.
- Fix a deadlock and some typos.
- Unbreak ptrace().
 1.166.2.17  30-Jan-2007  ad Remove support for SA. Ok core@.
 1.166.2.16  28-Jan-2007  ad - Fix sequence error between saving/raising the SPL.
- Changes for JavaStation.
- Fix bugs with mips & sparc support routines.
 1.166.2.15  28-Jan-2007  ad - Remove the last use of mtsleep()
- sched_pause() -> kpause()
 1.166.2.14  27-Jan-2007  ad Rename some functions to better describe what they do.
 1.166.2.13  27-Jan-2007  ad Drop proclist_mutex and proc::p_smutex back to IPL_VM.
 1.166.2.12  27-Jan-2007  ad Fix a bug in sched_changepri() that could result in:
"panic: remrunqueue: bit X not set"
 1.166.2.11  25-Jan-2007  ad - Don't adjust the priority of kernel threads.
- Change sched_kpri() to be more fair to user processes.
- KNF
 1.166.2.10  25-Jan-2007  yamt schedcpu: don't forget to add p_rtime.
 1.166.2.9  25-Jan-2007  yamt schedcpu: fix a deadlock. (don't call psignal with p_smutex held.)
 1.166.2.8  11-Jan-2007  ad Checkpoint work in progress.
 1.166.2.7  29-Dec-2006  ad Checkpoint work in progress.
 1.166.2.6  18-Nov-2006  ad Sync with head.
 1.166.2.5  17-Nov-2006  ad Fix an obvious sleep/wakeup bug introduced in previous.
 1.166.2.4  17-Nov-2006  ad Checkpoint work in progress.
 1.166.2.3  24-Oct-2006  ad - Redo LWP locking slightly and fix some races.
- Fix some locking botches.
- Make signal mask / stack per-proc for SA processes.
- Add _lwp_kill().
 1.166.2.2  21-Oct-2006  ad Checkpoint work in progress on locking and per-LWP signals. Very much a
a work in progress and there is still a lot to do.
 1.166.2.1  11-Sep-2006  ad Add mtsleep(). Currently only useful if the wakeup comes from process
context. A fix for this is pending.
 1.167.2.2  10-Dec-2006  yamt sync with head.
 1.167.2.1  22-Oct-2006  yamt sync with head
 1.173.4.2  04-Jan-2008  skrll s/proc/LWP/ in a couple of comments
 1.173.4.1  17-May-2007  wrstuden Adjust how we handle allocating an SA Event structure used to inform
the application that we are blocking.

Since we only generate BLOCKED upcalls when the thread "on" the
vp blocks and further since the first thing we do after this is
run the upcall delivery code (we switch to the lwp lying around for
this very purpose), we will never generate more BLOCKED upcalls than
we have vps. We have an invariant, let's use it.

So we add a pointer to the vp structure, and store an upcall event
data structure in it. When we block in sa_switch(), we use that
structure. When we deliver an upcall into userland, if our vp is
lacking an upcall event structure, we don't free the upcall
event structure for the event we just delivered, we instead hang it
off of the vp.

We thus no longer need to allocate memory when blocking.

This version of the code contains a panic() in the case that we do
not have an upcall event structure available in sa_switch(). Keep this in
mind before putting this version into a production environment.

It has been suggested that it would be easier to embed an upcall event
structure in the virtual processor structure, given the invariant above.
The problem is that while we only can generate one BLOCKED per vp, we
do not necessarily deliver that upcall immediately to userland. If there
were pending upcalls, we deliver them first. As such, we can have code
running on the vp before the BLOCKED event is delivered. If _that_ code
blocks, we have no where to store that information. Thus we limit ourselves
to holding a pointer to separately-allocated memory.
 1.173.2.1  16-Sep-2008  bouyer Sync with the following revisions (requested by skrll in ticket #1196):
gnu/dist/gdb removed
gnu/usr.bin/gdb53 removed
distrib/cats/instkernel/Makefile 1.14.6.1
gnu/dist/gdb6/bfd/config.bfd 1.3.6.1
gnu/dist/gdb6/bfd/elfxx-sparc.c 1.1.1.2.6.1
gnu/dist/gdb6/bfd/elfxx-sparc.h 1.1.1.2.6.1
gnu/dist/gdb6/gdb/Makefile.in 1.2.2.1.2.2
gnu/dist/gdb6/gdb/alpha-tdep.c 1.1.1.2.6.1
gnu/dist/gdb6/gdb/alpha-tdep.h 1.1.1.2.6.1
gnu/dist/gdb6/gdb/alphabsd-nat.c 1.1.1.2.6.2
gnu/dist/gdb6/gdb/alphabsd-nat.h 1.1.2.1
gnu/dist/gdb6/gdb/alphabsd-tdep.c 1.1.1.2.6.1
gnu/dist/gdb6/gdb/alphabsd-tdep.h 1.1.1.2.6.1
gnu/dist/gdb6/gdb/alphanbsd-nat.c 1.1.2.1
gnu/dist/gdb6/gdb/alphanbsd-tdep.c 1.1.1.2.6.1
gnu/dist/gdb6/gdb/amd64-nat.c 1.1.1.2.6.1
gnu/dist/gdb6/gdb/amd64bsd-nat.c 1.1.1.2.6.1
gnu/dist/gdb6/gdb/amd64nbsd-nat.c 1.1.1.2.6.3
gnu/dist/gdb6/gdb/amd64nbsd-tdep.c 1.1.1.2.6.1
gnu/dist/gdb6/gdb/arm-tdep.h 1.1.1.2.6.1
gnu/dist/gdb6/gdb/armbsd-tdep.c 1.1.2.1
gnu/dist/gdb6/gdb/armnbsd-nat.c 1.1.1.2.6.2
gnu/dist/gdb6/gdb/armnbsd-tdep.c 1.1.1.2.6.1
gnu/dist/gdb6/gdb/configure 1.1.1.2.6.1
gnu/dist/gdb6/gdb/configure.ac 1.1.1.2.6.1
gnu/dist/gdb6/gdb/i386bsd-nat.c 1.1.1.2.6.1
gnu/dist/gdb6/gdb/i386nbsd-tdep.c 1.1.1.2.6.1
gnu/dist/gdb6/gdb/m68kbsd-nat.c 1.1.1.2.6.2
gnu/dist/gdb6/gdb/mipsnbsd-nat.c 1.1.1.2.6.2
gnu/dist/gdb6/gdb/nbsd-thread.c 1.1.2.3
gnu/dist/gdb6/gdb/ppcnbsd-nat.c 1.1.1.2.6.2
gnu/dist/gdb6/gdb/ppcnbsd-tdep.c 1.3.6.1
gnu/dist/gdb6/gdb/sh-tdep.c 1.1.1.2.6.1
gnu/dist/gdb6/gdb/shnbsd-nat.c 1.1.1.2.6.3
gnu/dist/gdb6/gdb/shnbsd-tdep.c 1.1.1.2.6.4
gnu/dist/gdb6/gdb/shnbsd-tdep.h 1.1.1.2.6.1
gnu/dist/gdb6/gdb/sparc-nat.c 1.1.1.2.6.1
gnu/dist/gdb6/gdb/sparc64nbsd-nat.c 1.1.1.2.6.2
gnu/dist/gdb6/gdb/sparcnbsd-nat.c 1.1.1.2.6.2
gnu/dist/gdb6/gdb/tramp-frame.h 1.1.1.2.6.1
gnu/dist/gdb6/gdb/vaxbsd-nat.c 1.1.1.2.6.2
gnu/dist/gdb6/gdb/config/alpha/nbsd.mh 1.1.1.2.6.1
gnu/dist/gdb6/gdb/config/arm/nbsd.mt 1.1.1.1.6.1
gnu/dist/gdb6/gdb/config/arm/nbsdelf.mh 1.1.1.1.6.1
gnu/dist/gdb6/gdb/config/i386/nbsd64.mh 1.1.1.1.6.1
gnu/dist/gdb6/gdb/config/m68k/nbsdelf.mh 1.1.1.1.6.1
gnu/dist/gdb6/gdb/config/mips/nbsd.mh 1.1.1.1.6.1
gnu/dist/gdb6/gdb/config/powerpc/nbsd.mh 1.1.1.2.6.1
gnu/dist/gdb6/gdb/config/sh/nbsd.mh 1.1.1.1.6.2
gnu/dist/gdb6/gdb/config/sh/tm-nbsd.h 1.1.1.1.6.1
gnu/dist/gdb6/gdb/config/sparc/nbsd64.mh 1.1.1.1.6.1
gnu/dist/gdb6/gdb/config/sparc/nbsdelf.mh 1.1.1.1.6.1
gnu/dist/gdb6/gdb/config/vax/nbsdelf.mh 1.1.1.1.6.1
gnu/dist/gdb6/opcodes/configure 1.1.1.2.6.1
gnu/dist/gdb6/opcodes/configure.in 1.1.1.2.6.1
gnu/usr.bin/Makefile 1.126.4.1
gnu/usr.bin/gdb6/arch/alpha/config.h 1.3.4.1
gnu/usr.bin/gdb6/arch/alpha/defs.mk 1.2.6.1
gnu/usr.bin/gdb6/arch/alpha/init.c 1.2.6.1
gnu/usr.bin/gdb6/arch/alpha/nm.h 1.2.6.1
gnu/usr.bin/gdb6/arch/arm/defs.mk 1.2.6.2
gnu/usr.bin/gdb6/arch/arm/init.c 1.1.6.1
gnu/usr.bin/gdb6/arch/armeb/config.h 1.1.6.2
gnu/usr.bin/gdb6/arch/armeb/defs.mk 1.1.6.3
gnu/usr.bin/gdb6/arch/armeb/init.c 1.1.6.2
gnu/usr.bin/gdb6/arch/armeb/tm.h 1.1.6.2
gnu/usr.bin/gdb6/arch/armeb/version.c 1.1.6.2
gnu/usr.bin/gdb6/arch/i386/defs.mk 1.4.4.1
gnu/usr.bin/gdb6/arch/i386/init.c 1.3.6.1
gnu/usr.bin/gdb6/arch/m68000/config.h 1.1.6.2
gnu/usr.bin/gdb6/arch/m68000/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/arch/m68000/init.c 1.1.6.2
gnu/usr.bin/gdb6/arch/m68000/tm.h 1.1.6.2
gnu/usr.bin/gdb6/arch/m68000/version.c 1.1.6.2
gnu/usr.bin/gdb6/arch/m68k/defs.mk 1.1.4.1
gnu/usr.bin/gdb6/arch/m68k/init.c 1.1.4.1
gnu/usr.bin/gdb6/arch/mipseb/config.h 1.3.4.1
gnu/usr.bin/gdb6/arch/mipseb/defs.mk 1.2.6.2
gnu/usr.bin/gdb6/arch/mipseb/init.c 1.2.6.2
gnu/usr.bin/gdb6/arch/mipsel/config.h 1.2.6.3
gnu/usr.bin/gdb6/arch/mipsel/defs.mk 1.2.6.3
gnu/usr.bin/gdb6/arch/mipsel/init.c 1.2.6.3
gnu/usr.bin/gdb6/arch/mipsel/tm.h 1.2.6.2
gnu/usr.bin/gdb6/arch/mipsel/version.c 1.2.6.2
gnu/usr.bin/gdb6/arch/powerpc/defs.mk 1.3.6.1
gnu/usr.bin/gdb6/arch/powerpc/init.c 1.3.6.1
gnu/usr.bin/gdb6/arch/sh3eb/config.h 1.2.2.2
gnu/usr.bin/gdb6/arch/sh3eb/defs.mk 1.2.8.3
gnu/usr.bin/gdb6/arch/sh3eb/init.c 1.1.8.3
gnu/usr.bin/gdb6/arch/sh3eb/nm.h 1.1.8.2
gnu/usr.bin/gdb6/arch/sh3eb/tm.h 1.1.8.2
gnu/usr.bin/gdb6/arch/sh3eb/version.c 1.1.8.2
gnu/usr.bin/gdb6/arch/sh3el/config.h 1.2.2.2
gnu/usr.bin/gdb6/arch/sh3el/defs.mk 1.2.8.3
gnu/usr.bin/gdb6/arch/sh3el/init.c 1.1.8.3
gnu/usr.bin/gdb6/arch/sh3el/nm.h 1.1.8.2
gnu/usr.bin/gdb6/arch/sh3el/tm.h 1.1.8.2
gnu/usr.bin/gdb6/arch/sh3el/version.c 1.1.8.2
gnu/usr.bin/gdb6/arch/sparc/defs.mk 1.2.6.1
gnu/usr.bin/gdb6/arch/sparc/init.c 1.1.6.1
gnu/usr.bin/gdb6/arch/sparc64/defs.mk 1.2.6.1
gnu/usr.bin/gdb6/arch/sparc64/init.c 1.1.6.1
gnu/usr.bin/gdb6/arch/vax/config.h 1.1.6.2
gnu/usr.bin/gdb6/arch/vax/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/arch/vax/init.c 1.1.6.2
gnu/usr.bin/gdb6/arch/vax/tm.h 1.1.6.2
gnu/usr.bin/gdb6/arch/vax/version.c 1.1.6.2
gnu/usr.bin/gdb6/arch/x86_64/defs.mk 1.2.6.1
gnu/usr.bin/gdb6/arch/x86_64/init.c 1.1.6.1
gnu/usr.bin/gdb6/bfd/arch/armeb/bfd.h 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/armeb/bfdver.h 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/armeb/config.h 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/armeb/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/m68000/bfd.h 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/m68000/bfdver.h 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/m68000/config.h 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/m68000/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/mipsel/bfd.h 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/mipsel/bfdver.h 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/mipsel/config.h 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/mipsel/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/sh3eb/bfd.h 1.1.8.3
gnu/usr.bin/gdb6/bfd/arch/sh3eb/bfdver.h 1.1.8.2
gnu/usr.bin/gdb6/bfd/arch/sh3eb/config.h 1.1.8.2
gnu/usr.bin/gdb6/bfd/arch/sh3eb/defs.mk 1.1.8.3
gnu/usr.bin/gdb6/bfd/arch/sh3el/bfd.h 1.1.8.3
gnu/usr.bin/gdb6/bfd/arch/sh3el/bfdver.h 1.1.8.2
gnu/usr.bin/gdb6/bfd/arch/sh3el/config.h 1.1.8.2
gnu/usr.bin/gdb6/bfd/arch/sh3el/defs.mk 1.1.8.3
gnu/usr.bin/gdb6/bfd/arch/vax/bfd.h 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/vax/bfdver.h 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/vax/config.h 1.1.6.2
gnu/usr.bin/gdb6/bfd/arch/vax/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/gdb/Makefile 1.5.2.1.2.2
gnu/usr.bin/gdb6/gdbtui/Makefile 1.2.6.1
gnu/usr.bin/gdb6/libiberty/arch/armeb/config.h 1.1.6.2
gnu/usr.bin/gdb6/libiberty/arch/armeb/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/libiberty/arch/m68000/config.h 1.1.6.2
gnu/usr.bin/gdb6/libiberty/arch/m68000/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/libiberty/arch/mipsel/config.h 1.1.6.2
gnu/usr.bin/gdb6/libiberty/arch/mipsel/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/libiberty/arch/sh3eb/config.h 1.1.8.2
gnu/usr.bin/gdb6/libiberty/arch/sh3eb/defs.mk 1.1.8.2
gnu/usr.bin/gdb6/libiberty/arch/sh3el/config.h 1.1.8.2
gnu/usr.bin/gdb6/libiberty/arch/sh3el/defs.mk 1.1.8.2
gnu/usr.bin/gdb6/libiberty/arch/vax/config.h 1.1.6.2
gnu/usr.bin/gdb6/libiberty/arch/vax/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/opcodes/arch/armeb/config.h 1.1.6.2
gnu/usr.bin/gdb6/opcodes/arch/armeb/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/opcodes/arch/m68000/config.h 1.1.6.2
gnu/usr.bin/gdb6/opcodes/arch/m68000/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/opcodes/arch/mipsel/config.h 1.1.6.2
gnu/usr.bin/gdb6/opcodes/arch/mipsel/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/opcodes/arch/sh3eb/config.h 1.1.8.2
gnu/usr.bin/gdb6/opcodes/arch/sh3eb/defs.mk 1.1.8.3
gnu/usr.bin/gdb6/opcodes/arch/sh3el/config.h 1.1.8.2
gnu/usr.bin/gdb6/opcodes/arch/sh3el/defs.mk 1.1.8.3
gnu/usr.bin/gdb6/opcodes/arch/vax/config.h 1.1.6.2
gnu/usr.bin/gdb6/opcodes/arch/vax/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/readline/arch/armeb/config.h 1.1.6.2
gnu/usr.bin/gdb6/readline/arch/armeb/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/readline/arch/m68000/config.h 1.1.6.2
gnu/usr.bin/gdb6/readline/arch/m68000/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/readline/arch/mipsel/config.h 1.1.6.2
gnu/usr.bin/gdb6/readline/arch/mipsel/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/readline/arch/sh3eb/config.h 1.1.8.2
gnu/usr.bin/gdb6/readline/arch/sh3eb/defs.mk 1.1.8.2
gnu/usr.bin/gdb6/readline/arch/sh3el/config.h 1.1.8.2
gnu/usr.bin/gdb6/readline/arch/sh3el/defs.mk 1.1.8.2
gnu/usr.bin/gdb6/readline/arch/vax/config.h 1.1.6.2
gnu/usr.bin/gdb6/readline/arch/vax/defs.mk 1.1.6.2
gnu/usr.bin/gdb6/sim/arch/mipseb/cconfig.h 1.1.2.1
gnu/usr.bin/gdb6/sim/arch/mipseb/config.h 1.1.2.1
gnu/usr.bin/gdb6/sim/arch/mipseb/defs.mk 1.1.2.1
gnu/usr.bin/gdb6/sim/arch/mipsel/cconfig.h 1.1.2.1
gnu/usr.bin/gdb6/sim/arch/mipsel/config.h 1.1.2.1
gnu/usr.bin/gdb6/sim/arch/mipsel/defs.mk 1.1.2.1
lib/libkvm/kvm_sparc64.c 1.10.18.2
lib/libpthread/pthread.c 1.48.6.4
lib/libpthread/pthread_barrier.c 1.6.18.1
lib/libpthread/pthread_cond.c 1.18.12.2
lib/libpthread/pthread_debug.h 1.8.18.1
lib/libpthread/pthread_int.h 1.34.4.5
lib/libpthread/pthread_lock.c 1.14.6.1
lib/libpthread/pthread_mutex.c 1.22.4.2
lib/libpthread/pthread_run.c 1.18.12.4
lib/libpthread/pthread_rwlock.c 1.13.6.2
lib/libpthread/pthread_sa.c 1.37.6.5
lib/libpthread/pthread_sig.c 1.47.4.8
lib/libpthread/pthread_sleep.c 1.7.6.2
lib/libpthread/sem.c 1.9.6.2
lib/libpthread/arch/sh3/pthread_md.h 1.3.6.1
regress/lib/libpthread/resolv/Makefile 1.1.12.1
regress/lib/libpthread/sigrunning/Makefile 1.1.2.1
regress/lib/libpthread/sigrunning/sigrunning.c 1.1.2.1
share/mk/bsd.own.mk 1.489.4.3
sys/arch/amd64/amd64/locore.S 1.18.14.1
sys/arch/amd64/amd64/machdep.c 1.44.2.3.2.1
sys/arch/amd64/conf/kern.ldscript 1.1.70.1
sys/arch/cats/conf/Makefile.cats.inc 1.17.30.1
sys/arch/shark/conf/Makefile.shark.inc 1.6.30.1
sys/arch/sparc64/conf/kern.ldscript 1.7.26.2
sys/arch/sparc64/conf/kern32.ldscript 1.6.26.2
sys/arch/sparc64/include/kcore.h 1.4.92.2
sys/arch/sparc64/sparc64/locore.s 1.232.4.4
sys/arch/sparc64/sparc64/machdep.c 1.193.4.3
sys/arch/sparc64/sparc64/pmap.c 1.184.2.1.2.4
sys/conf/newvers.sh 1.42.26.2
sys/kern/kern_sa.c 1.87.4.11
sys/kern/kern_synch.c 1.173.4.2
sys/sys/savar.h 1.20.10.2
tools/gdb/Makefile 1.9.4.1
tools/gdb/mknative-gdb 1.1.6.1

pullup the wrstuden-fixsa CVS branch to netbsd-4:
toolchain/35540 - GDB 6 support for pthreads.
port-sparc64/37534 - ktrace firefox gives
kernel trap 30: data access expection
GDB changes:
- delete gdb53
- enable gdb6 on all architectures
- add support for amd64 crash dumps
- add support for sparc64 crash dumps
- add support for /proc pid to executable filename for all archs
- enable thread support for all architectures
- add a note section to kernels to all platforms
- support detection/unwinding of signals for most architectures.
- Fix PTHREAD_UCONTEXT_TO_REG / PTHREAD_REG_TO_UCONTEXT on sh3.
- Apply fix from binutils-current so that sparc gdb can be cross built
on a 64bit host.
SA/pthread changes:
Pre-allocate memory needed for event delivery. Eliminates dropped
interrupts under load.
Deliver intra-process signals to running threads
Eliminate some deadlock scenarios
Fix intra-process signal delivery when delivering to a thread waiting
for signals. Makes afs work again!
 1.177.2.29  13-May-2007  ad Assign a per-CPU lock to LWPs as they transition into the ONPROC state.

http://mail-index.netbsd.org/tech-kern/2007/05/06/0003.html
 1.177.2.28  21-Apr-2007  ad Some changes mainly for top/ps:

- Add an optional name field to struct lwp.
- Count the total number of context switches + involuntary,
not voluntary + involuntary.
- Mark the idle threads as LSIDL when not running, otherwise
they show up funny in a top(1) that shows threads.
- Make pctcpu and cpticks per-LWP attributes.
- Add to kinfo_lwp: cpticks, pctcpu, pid, name.
 1.177.2.27  19-Apr-2007  ad Pull up a change from the vmlocking branch:

- Ensure that LWPs going to sleep are on the sleep queue before releasing
any interlocks. This is so that calls to turnstile_wakeup will have the
correct locks held when adjusting priority. Avoids another deadlock.
- Assume that LWPs blocked on a turnstile will never be swapped out.
- LWPs blocking on a turnstile must have kernel priority, as they
are consuming kernel resources.
 1.177.2.26  18-Apr-2007  ad Add back missing RCS ID.
 1.177.2.25  18-Apr-2007  yamt sched_pstats: fix p_pctcpu decay.
 1.177.2.24  16-Apr-2007  ad - Nuke the seperate scheduler locking scheme for UP kernels - it has been
at the root of too many bugs.
- Add a LW_BOUND flag that indicates an LWP is bound to a specific CPU.
 1.177.2.23  04-Apr-2007  ad Add cpu_did_resched() and call with the per-CPU scheduler lock held.
Needed to interlock with cpu_need_resched() on other CPUs, as there
is a short window where the flag could be clobbered while a higher
priority LWP is waiting to run.
 1.177.2.22  03-Apr-2007  matt Nuke __HAVE_BIGENDIAN_BITOPS
 1.177.2.21  02-Apr-2007  rmind - Move the ccpu sysctl back to the scheduler-independent part.
- Move the scheduler-independent parts of 4BSD's schedcpu() to
kern_synch.c.
- Add scheduler-specific hook to satisfy individual scheduler's
needs.
- Remove autonice, which is archaic and not useful.

Patch provided by Daniel Sieger.
 1.177.2.20  24-Mar-2007  ad - Ensure that context switch always happens at least at IPL_SCHED, even
if no spin lock is held. Should fix the assertion failure seen on hppa.
- Reduce the amount of spl frobbing in mi_switch.
- Add some comments.

Reviewed by yamt@.
 1.177.2.19  24-Mar-2007  rmind sched_nextlwp: Remove struct lwp * argument, it is no longer needed.
Note by yamt@
 1.177.2.18  24-Mar-2007  rmind Checkpoint:
- Abstract for per-CPU locking of runqueues.
As a workaround for SCHED_4BSD global runqueue, covered by sched_mutex,
spc_mutex is a pointer for now. After making SCHED_4BSD runqueues
per-CPU, it will became a storage mutex.
- suspendsched: Locking is not necessary for cpu_need_resched().
- Remove mutex_spin_exit() prototype in patch.c and LOCK_ASSERT() check
in runqueue_nextlwp() in sched_4bsd.c to make them compile again.
 1.177.2.17  21-Mar-2007  ad Previously cpu_info::ci_curlwp was protected by the sched_mutex and this is
used in a few places to synchronise. Now the state of LWPs is protected
during switch by their current lock (which might be e.g. a sleep queue
lock). So ci_curlwp is unlocked, which is necessary to be able to do
preemption and to run interrupts as LWPs cheaply.

Add a (locked) flag to the lwp (LW_RUNNING) that indicates if it is on CPU
somewhere. More exactly, it means that the LWP's state is tied to a CPU, and
that the LWP has not yet switched away even if (l->l_cpu->ci_curcpu != l) or
(l->l_stat != LSONPROC).
 1.177.2.16  17-Mar-2007  rmind Do not do an implicit enqueue in sched_switch(), move enqueueing back to
the dispatcher. Rename sched_switch() back to sched_nextlwp(). Add for
sched_enqueue() new argument, which indicates the calling from mi_switch().

Requested by yamt@
 1.177.2.15  12-Mar-2007  rmind Sync with HEAD.
 1.177.2.14  09-Mar-2007  rmind Minor update on mi_switch commentary. From Daniel Sieger.
 1.177.2.13  09-Mar-2007  rmind Checkpoint:

- Addition of scheduler-specific pointers in the struct proc, lwp and
schedstate_percpu.
- Addition of sched_lwp_fork(), sched_lwp_exit() and sched_slept() hooks.
- mi_switch() now has only one argument.
- sched_nextlwp(void) becomes sched_switch(struct lwp *) and does an
enqueueing of LWP.
- Addition of general kern.sched sysctl node.
- Remove twice called uvmexp.swtch++, other cleanups.

Discussed on tech-kern@
 1.177.2.12  03-Mar-2007  yamt sched_switch_unlock: add an assertion.
 1.177.2.11  27-Feb-2007  yamt fix merge botches.
 1.177.2.10  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.177.2.9  26-Feb-2007  yamt add an "immediate" flag for cpu_need_resched(). suggested by Andrew Doran.
 1.177.2.8  23-Feb-2007  yamt - introduce sys/cpu.h which has cpu_idle and cpu_need_resched.
- use it where appropriate.
- while i'm here, remove several unnecessary #include.
 1.177.2.7  23-Feb-2007  yamt mi_switch: update l_stat and l_cpu with sched_mutex held.
 1.177.2.6  21-Feb-2007  yamt remove some unnecessary #include.
 1.177.2.5  20-Feb-2007  rmind General Common Scheduler Framework (CSF) patch import. Huge thanks for
Daniel Sieger <dsieger at TechFak.Uni-Bielefeld de> for this work.

Short abstract: Split the dispatcher from the scheduler in order to
make the scheduler more modular. Introduce initial API for other
schedulers' implementations.

Discussed in tech-kern@
OK: yamt@, ad@

Note: further work will go soon.
 1.177.2.4  18-Feb-2007  yamt sched_switch_unlock: fix a typo which causes lockdebug panic.
 1.177.2.3  17-Feb-2007  yamt mi_switch: reduce the number of microtime calls.
 1.177.2.2  17-Feb-2007  yamt while switching, hold "switched from" lwp's l_mutex rather than sched_mutex.
 1.177.2.1  17-Feb-2007  yamt - separate context switching and thread scheduling.
- introduce idle lwp.
- change some related MD/MI interfaces and implement i386 version.
 1.186.2.21  01-Nov-2007  ad - Fix interactivity problems under high load. Beacuse soft interrupts
are being stacked on top of regular LWPs, more often than not aston()
was being called on a soft interrupt thread instead of a user thread,
meaning that preemption was not happening on EOI.

- Don't use bool in a couple of data structures. Sub-word writes are not
always atomic and may clobber other fields in the containing word.

- For SCHED_4BSD, make p_estcpu per thread (l_estcpu). Rework how the
dynamic priority level is calculated - it's much better behaved now.

- Kill the l_usrpri/l_priority split now that priorities are no longer
directly assigned by tsleep(). There are three fields describing LWP
priority:

l_priority: Dynamic priority calculated by the scheduler.
This does not change for kernel/realtime threads,
and always stays within the correct band. Eg for
timeshared LWPs it never moves out of the user
priority range. This is basically what l_usrpri
was before.

l_inheritedprio: Lent to the LWP due to priority inheritance
(turnstiles).

l_kpriority: A boolean value set true the first time an LWP
sleeps within the kernel. This indicates that the LWP
should get a priority boost as compensation for blocking.
lwp_eprio() now does the equivalent of sched_kpri() if
the flag is set. The flag is cleared in userret().

- Keep track of scheduling class (OTHER, FIFO, RR) in struct lwp, and use
this to make decisions in a few places where we previously tested for a
kernel thread.

- Partially fix itimers and usr/sys/intr time accounting in the presence
of software interrupts.

- Use kthread_create() to create idle LWPs. Move priority definitions
from the various modules into sys/param.h.

- newlwp -> lwp_create
 1.186.2.20  23-Oct-2007  ad Sync with head.
 1.186.2.19  18-Oct-2007  ad Update for soft interrupt changes. See kern_softint.c 1.1.2.17 for details.
 1.186.2.18  10-Oct-2007  rmind Sync with HEAD.
 1.186.2.17  09-Oct-2007  ad Sync with head.
 1.186.2.16  01-Sep-2007  yamt make "softint block" evcnt per softint_t. ok'ed by Andrew Doran.
 1.186.2.15  31-Aug-2007  yamt remove a stale comment.
 1.186.2.14  31-Aug-2007  yamt updatertime: constify.
 1.186.2.13  21-Aug-2007  ad A few minor corrections around calls to cpu_need_resched().
 1.186.2.12  20-Aug-2007  ad Sync with HEAD.
 1.186.2.11  14-Jul-2007  ad Make it possible to track time spent by soft interrupts as is done for
normal LWPs, and provide a sysctl to switch it on/off. Not enabled by
default because microtime() is not free. XXX Not happy with this but
I want it get it out of my local tree for the time being.
 1.186.2.10  07-Jul-2007  ad - Remove the interrupt priority range and use 'kernel RT' instead,
since only soft interrupts are threaded.
- Rename l->l_pinned to l->l_switchto. It might be useful for (re-)
implementing SA or doors.
- Simplify soft interrupt dispatch so MD code is doing as little as
possible that is new.
 1.186.2.9  01-Jul-2007  ad - Adapt to callout API change.
- Add a counter to track how often soft interrupts sleep.
 1.186.2.8  17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.186.2.7  09-Jun-2007  ad Sync with head.
 1.186.2.6  08-Jun-2007  ad Sync with head.
 1.186.2.5  10-Apr-2007  ad - Ensure that that LWPs going to sleep are on the sleep queue and so
have their syncobj pointer updated, so that calls to turnstile_wakeup
will have the correct locks held when adjusting the current LWP's
priority. Avoids another deadlock.
- Assume that LWPs blocked on a turnstile will never be swapped out.
- LWPs blocking on a turnstile must have kernel priority, as they
are consuming kernel resources.
 1.186.2.4  05-Apr-2007  ad - Make context switch counters 64-bit, and count the total number of
context switches + voluntary, instead of involuntary + voluntary.
- Add lwp::l_swaplock for uvm.
- PHOLD/PRELE are replaced.
 1.186.2.3  21-Mar-2007  ad - Put a lock around the proc's CWD info (work in progress).
- Replace some more simplelocks.
- Make lbolt a condvar.
 1.186.2.2  13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.186.2.1  13-Mar-2007  ad Sync with head.
 1.187.2.1  11-Jul-2007  mjf Sync with head.
 1.190.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.192.2.13  09-Dec-2007  jmcneill Sync with HEAD.
 1.192.2.12  03-Dec-2007  joerg Sync with HEAD.
 1.192.2.11  14-Nov-2007  joerg Sync with HEAD.
 1.192.2.10  11-Nov-2007  joerg Sync with HEAD.
 1.192.2.9  06-Nov-2007  joerg Sync with HEAD.
 1.192.2.8  06-Nov-2007  joerg Sync with HEAD.
 1.192.2.7  04-Nov-2007  jmcneill Sync with HEAD.
 1.192.2.6  31-Oct-2007  joerg Sync with HEAD.
 1.192.2.5  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.192.2.4  04-Oct-2007  joerg Sync with HEAD.
 1.192.2.3  02-Oct-2007  joerg Sync with HEAD.
 1.192.2.2  09-Aug-2007  jmcneill Sync with HEAD.
 1.192.2.1  04-Aug-2007  jmcneill Sync with HEAD.
 1.194.6.2  06-Aug-2007  yamt suspendsched: reduce #ifdef.
 1.194.6.1  06-Aug-2007  yamt file kern_synch.c was added on branch matt-mips64 on 2007-08-06 11:48:24 +0000
 1.194.4.2  14-Oct-2007  yamt sync with head.
 1.194.4.1  06-Oct-2007  yamt sync with head.
 1.194.2.5  23-Mar-2008  matt sync with HEAD
 1.194.2.4  09-Jan-2008  matt sync with HEAD
 1.194.2.3  08-Nov-2007  matt sync with -HEAD
 1.194.2.2  06-Nov-2007  matt sync with HEAD
 1.194.2.1  28-Aug-2007  matt Pre-init the static structures (lwp0,proc0,session0,etc.) whenever possible.
Use curlwp_set()
 1.201.2.1  13-Nov-2007  bouyer Sync with HEAD
 1.203.2.4  18-Feb-2008  mjf Sync with HEAD.
 1.203.2.3  27-Dec-2007  mjf Sync with HEAD.
 1.203.2.2  08-Dec-2007  mjf Sync with HEAD.
 1.203.2.1  19-Nov-2007  mjf Sync with HEAD.
 1.211.6.3  19-Jan-2008  bouyer Sync with HEAD
 1.211.6.2  08-Jan-2008  bouyer Sync with HEAD
 1.211.6.1  02-Jan-2008  bouyer Sync with HEAD
 1.211.2.4  28-Dec-2007  ad Sync with head.
 1.211.2.3  26-Dec-2007  ad Sync with head.
 1.211.2.2  08-Dec-2007  ad Fix merge error.
 1.211.2.1  04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.217.6.4  17-Jan-2009  mjf Sync with HEAD.
 1.217.6.3  28-Sep-2008  mjf Sync with HEAD.
 1.217.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.217.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.217.2.1  24-Mar-2008  keiichi sync with head.
 1.227.2.2  04-Jun-2008  yamt sync with head
 1.227.2.1  18-May-2008  yamt sync with head.
 1.230.2.6  11-Aug-2010  yamt sync with head.
 1.230.2.5  11-Mar-2010  yamt sync with head
 1.230.2.4  19-Aug-2009  yamt sync with head.
 1.230.2.3  18-Jul-2009  yamt sync with head.
 1.230.2.2  04-May-2009  yamt sync with head.
 1.230.2.1  16-May-2008  yamt sync with head.
 1.241.2.6  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.241.2.5  21-Jul-2008  wrstuden Add support for compiling SA as an option. Implied by COMPAT_40.

i386 kernels both with COMPAT_40 and with no compat options (and thus
no SA) compile.

No functional changes intended.
 1.241.2.4  29-Jun-2008  wrstuden Move the call to sa_awaken() up some so we always call it. sa_yield
threads now are on a condvar, and otherwise won't get this routine. Since
this routine is specifically to kick sa_yield threads back to life, don't
do that.
 1.241.2.3  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.241.2.2  23-May-2008  wrstuden Re-add calls to sa_awaken() and sa_switch(). Also add
back sa_awaken() itself. While here, remove the "type"
arguement to sa_switch() - it is all about generating
blocked upcalls.
 1.241.2.1  10-May-2008  wrstuden Initial checkin of re-adding SA. Everything except kern_sa.c
compiles in GENERIC for i386. This is still a work-in-progress, but
this checkin covers most of the mechanical work (changing signalling
to be able to accomidate SA's process-wide signalling and re-adding
includes of sys/sa.h and savar.h). Subsequent changes will be much
more interesting.

Also, kern_sa.c has received partial cleanup. There's still more
to do, though.
 1.248.2.2  28-Jul-2008  simonb Sync with head.
 1.248.2.1  03-Jul-2008  simonb Sync with head.
 1.250.2.2  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.250.2.1  19-Oct-2008  haad Sync with HEAD.
 1.252.2.3  28-Apr-2009  skrll Sync with HEAD.
 1.252.2.2  03-Mar-2009  skrll Sync with HEAD.
 1.252.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.254.2.7  07-Nov-2015  snj Pull up following revision(s) (requested by pgoyette in ticket #1979):
sys/kern/kern_synch.c: revision 1.309
sys/kern/kern_exit.c: revisions 1.246, 1.247
sys/kern/kern_exec.c: revision 1.419
In execve_runproc(), update the p_waited entry for the process being
moved to SSTOP state, not for its parent. (It is correct to update
the parent's p_nstopchild count.) If the value is not already zero,
it could prevent its parent from waiting for the process.
Fixes PR kern/50298
--
When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.
Fixes PR kern/50318
--
Currently, if a process is exiting and its parent has indicated no intent
of reaping the process (nor any other children), the process wil get
reparented to init. Since the state of the exiting process at this point
is SDEAD, proc_reparent() will not update either the old or new parent's
p_nstopchild counters.
This change causes both old and new parents to be properly updated.
Fixes PR kern/50300
--
For processes marked with PS_STOPEXIT, update the process's p_waited
value, and update its parent's p_nstopchild value when marking the
process's p_stat to SSTOP. The process needed to be SACTIVE to get
here, so this transition represents an additional process for which
the parent needs to wait.
Fixes PR kern/50308
 1.254.2.6  23-Apr-2009  snj branches: 1.254.2.6.6; 1.254.2.6.10;
Pull up following revision(s) (requested by yamt in ticket #720):
sys/kern/kern_synch.c: revision 1.262
kpreempt: report a failure of cpu_kpreempt_enter. otherwise x86 trap()
loops infinitely. PR/41202.
 1.254.2.5  06-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #415):
sys/kern/kern_synch.c: revision 1.260
Warn once and no more about backwards monotonic clock.
 1.254.2.4  02-Feb-2009  snj Pull up following revision(s) (requested by rmind in ticket #372):
sys/kern/kern_synch.c: revision 1.259
sched_pstats: add few checks to catch the problem. OK by <ad>.
 1.254.2.3  02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #360):
sys/kern/kern_synch.c: revision 1.258
Redo previous. Don't count deferrals due to raised IPL. It's not that
meaningful.
 1.254.2.2  02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #360):
sys/kern/kern_synch.c: revision 1.257
Don't increment the 'kpreempt defer: IPL' counter if a preemption is pending
and we try to process it from interrupt context. We can't process it, and
will be handled at EOI anyway. Can happen when kernel_lock is released.
 1.254.2.1  02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #353):
sys/kern/kern_sig.c: revision 1.293
sys/kern/kern_synch.c: revision 1.256
PR kern/36183 problem with ptrace and multithreaded processes
Fix the famous "gdb + threads = panic" problem.
Also, fix another revivesa merge botch.
 1.254.2.6.10.1  07-Nov-2015  snj Pull up following revision(s) (requested by pgoyette in ticket #1979):
sys/kern/kern_synch.c: revision 1.309
sys/kern/kern_exit.c: revisions 1.246, 1.247
sys/kern/kern_exec.c: revision 1.419
In execve_runproc(), update the p_waited entry for the process being
moved to SSTOP state, not for its parent. (It is correct to update
the parent's p_nstopchild count.) If the value is not already zero,
it could prevent its parent from waiting for the process.
Fixes PR kern/50298
--
When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.
Fixes PR kern/50318
--
Currently, if a process is exiting and its parent has indicated no intent
of reaping the process (nor any other children), the process wil get
reparented to init. Since the state of the exiting process at this point
is SDEAD, proc_reparent() will not update either the old or new parent's
p_nstopchild counters.
This change causes both old and new parents to be properly updated.
Fixes PR kern/50300
--
For processes marked with PS_STOPEXIT, update the process's p_waited
value, and update its parent's p_nstopchild value when marking the
process's p_stat to SSTOP. The process needed to be SACTIVE to get
here, so this transition represents an additional process for which
the parent needs to wait.
Fixes PR kern/50308
 1.254.2.6.6.1  07-Nov-2015  snj Pull up following revision(s) (requested by pgoyette in ticket #1979):
sys/kern/kern_synch.c: revision 1.309
sys/kern/kern_exit.c: revisions 1.246, 1.247
sys/kern/kern_exec.c: revision 1.419
In execve_runproc(), update the p_waited entry for the process being
moved to SSTOP state, not for its parent. (It is correct to update
the parent's p_nstopchild count.) If the value is not already zero,
it could prevent its parent from waiting for the process.
Fixes PR kern/50298
--
When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.
Fixes PR kern/50318
--
Currently, if a process is exiting and its parent has indicated no intent
of reaping the process (nor any other children), the process wil get
reparented to init. Since the state of the exiting process at this point
is SDEAD, proc_reparent() will not update either the old or new parent's
p_nstopchild counters.
This change causes both old and new parents to be properly updated.
Fixes PR kern/50300
--
For processes marked with PS_STOPEXIT, update the process's p_waited
value, and update its parent's p_nstopchild value when marking the
process's p_stat to SSTOP. The process needed to be SACTIVE to get
here, so this transition represents an additional process for which
the parent needs to wait.
Fixes PR kern/50308
 1.260.2.2  23-Jul-2009  jym Sync with HEAD.
 1.260.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.274.2.3  06-Nov-2010  uebayasi Sync with HEAD.
 1.274.2.2  17-Aug-2010  uebayasi Sync with HEAD.
 1.274.2.1  30-Apr-2010  uebayasi Sync with HEAD.
 1.280.2.4  31-May-2011  rmind sync with head
 1.280.2.3  21-Apr-2011  rmind sync with head
 1.280.2.2  05-Mar-2011  rmind sync with head
 1.280.2.1  30-May-2010  rmind sync with head
 1.286.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.295.2.5  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.295.2.4  30-Oct-2012  yamt sync with head
 1.295.2.3  23-May-2012  yamt sync with head.
 1.295.2.2  17-Apr-2012  yamt sync with head
 1.295.2.1  10-Nov-2011  yamt sync with head
 1.296.4.6  29-Apr-2012  mrg sync to latest -current.
 1.296.4.5  06-Mar-2012  mrg sync to -current
 1.296.4.4  06-Mar-2012  mrg sync to -current
 1.296.4.3  04-Mar-2012  mrg sync to latest -current.
 1.296.4.2  24-Feb-2012  mrg sync to -current.
 1.296.4.1  18-Feb-2012  mrg merge to -current.
 1.297.2.2  15-Nov-2015  bouyer Pull up following revision(s) (requested by pgoyette in ticket #1333):
sys/kern/kern_exec.c: revision 1.420
sys/kern/kern_synch.c: revision 1.309
sys/kern/kern_exit.c: revision 1.246
sys/kern/kern_exit.c: revision 1.247
sys/kern/kern_exec.c: revision 1.419
In execve_runproc(), update the p_waited entry for the process being
moved to SSTOP state, not for its parent. (It is correct to update
the parent's p_nstopchild count.) If the value is not already zero,
it could prevent its parent from waiting for the process.
Fixes PR kern/50298
Pullups will be requested for:
NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.
Fixes PR kern/50318
Pullups will be requested for:
NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
Currently, if a process is exiting and its parent has indicated no intent
of reaping the process (nor any other children), the process wil get
reparented to init. Since the state of the exiting process at this point
is SDEAD, proc_reparent() will not update either the old or new parent's
p_nstopchild counters.
This change causes both old and new parents to be properly updated.
Fixes PR kern/50300
Pullups will be requested for:
NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
For processes marked with PS_STOPEXIT, update the process's p_waited
value, and update its parent's p_nstopchild value when marking the
process's p_stat to SSTOP. The process needed to be SACTIVE to get
here, so this transition represents an additional process for which
the parent needs to wait.
Fixes PR kern/50308
Pullups will be requested for:
NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
In spawn_return() we temporarily move the process state to SSTOP, but
without updating its p_waited value or its parent's p_nstopchild
counter. Later, we restore the original state, again without any
adjustment of the related values. This leaves a relatively short
window when the values are inconsistent and could interfere with the
proper operation of sys_wait() for the parent (if it manages to be
scheduled; it's not totally clear what, if anything, prevents
scheduling/execution of the parent).
If during this window, any of the checks being made result in an
error, we call exit1() which will eventually migrate the process's
state to SDEAD (with an intermediate transition to SDYING). At
this point the other variables get updated, and we finally restore
a consistent state.
This change updates the p_waited and parent's p_nstopchild at each
step to eliminate any windows during which the values could lead to
incorrect decisions.
Fixes PR kern/50330
Pullups will be requested for NetBSD-7, -6, -6-0, and -6-1
 1.297.2.1  19-Aug-2012  riz branches: 1.297.2.1.4; 1.297.2.1.6;
Pull up following revision(s) (requested by christos in ticket #513):
sys/kern/kern_synch.c: revision 1.303
PR/46811: Tetsua Isaki: Don't handle cpu limits when runtime is negative.
 1.297.2.1.6.1  15-Nov-2015  bouyer Pull up following revision(s) (requested by pgoyette in ticket #1333):
sys/kern/kern_exec.c: revision 1.420
sys/kern/kern_synch.c: revision 1.309
sys/kern/kern_exit.c: revision 1.246
sys/kern/kern_exit.c: revision 1.247
sys/kern/kern_exec.c: revision 1.419
In execve_runproc(), update the p_waited entry for the process being
moved to SSTOP state, not for its parent. (It is correct to update
the parent's p_nstopchild count.) If the value is not already zero,
it could prevent its parent from waiting for the process.
Fixes PR kern/50298
Pullups will be requested for:
NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.
Fixes PR kern/50318
Pullups will be requested for:
NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
Currently, if a process is exiting and its parent has indicated no intent
of reaping the process (nor any other children), the process wil get
reparented to init. Since the state of the exiting process at this point
is SDEAD, proc_reparent() will not update either the old or new parent's
p_nstopchild counters.
This change causes both old and new parents to be properly updated.
Fixes PR kern/50300
Pullups will be requested for:
NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
For processes marked with PS_STOPEXIT, update the process's p_waited
value, and update its parent's p_nstopchild value when marking the
process's p_stat to SSTOP. The process needed to be SACTIVE to get
here, so this transition represents an additional process for which
the parent needs to wait.
Fixes PR kern/50308
Pullups will be requested for:
NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
In spawn_return() we temporarily move the process state to SSTOP, but
without updating its p_waited value or its parent's p_nstopchild
counter. Later, we restore the original state, again without any
adjustment of the related values. This leaves a relatively short
window when the values are inconsistent and could interfere with the
proper operation of sys_wait() for the parent (if it manages to be
scheduled; it's not totally clear what, if anything, prevents
scheduling/execution of the parent).
If during this window, any of the checks being made result in an
error, we call exit1() which will eventually migrate the process's
state to SDEAD (with an intermediate transition to SDYING). At
this point the other variables get updated, and we finally restore
a consistent state.
This change updates the p_waited and parent's p_nstopchild at each
step to eliminate any windows during which the values could lead to
incorrect decisions.
Fixes PR kern/50330
Pullups will be requested for NetBSD-7, -6, -6-0, and -6-1
 1.297.2.1.4.1  15-Nov-2015  bouyer Pull up following revision(s) (requested by pgoyette in ticket #1333):
sys/kern/kern_exec.c: revision 1.420
sys/kern/kern_synch.c: revision 1.309
sys/kern/kern_exit.c: revision 1.246
sys/kern/kern_exit.c: revision 1.247
sys/kern/kern_exec.c: revision 1.419
In execve_runproc(), update the p_waited entry for the process being
moved to SSTOP state, not for its parent. (It is correct to update
the parent's p_nstopchild count.) If the value is not already zero,
it could prevent its parent from waiting for the process.
Fixes PR kern/50298
Pullups will be requested for:
NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.
Fixes PR kern/50318
Pullups will be requested for:
NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
Currently, if a process is exiting and its parent has indicated no intent
of reaping the process (nor any other children), the process wil get
reparented to init. Since the state of the exiting process at this point
is SDEAD, proc_reparent() will not update either the old or new parent's
p_nstopchild counters.
This change causes both old and new parents to be properly updated.
Fixes PR kern/50300
Pullups will be requested for:
NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
For processes marked with PS_STOPEXIT, update the process's p_waited
value, and update its parent's p_nstopchild value when marking the
process's p_stat to SSTOP. The process needed to be SACTIVE to get
here, so this transition represents an additional process for which
the parent needs to wait.
Fixes PR kern/50308
Pullups will be requested for:
NetBSD-7, -6, -6-0, -6-1, -5, -5-0, -5-1, and -5-2
In spawn_return() we temporarily move the process state to SSTOP, but
without updating its p_waited value or its parent's p_nstopchild
counter. Later, we restore the original state, again without any
adjustment of the related values. This leaves a relatively short
window when the values are inconsistent and could interfere with the
proper operation of sys_wait() for the parent (if it manages to be
scheduled; it's not totally clear what, if anything, prevents
scheduling/execution of the parent).
If during this window, any of the checks being made result in an
error, we call exit1() which will eventually migrate the process's
state to SDEAD (with an intermediate transition to SDYING). At
this point the other variables get updated, and we finally restore
a consistent state.
This change updates the p_waited and parent's p_nstopchild at each
step to eliminate any windows during which the values could lead to
incorrect decisions.
Fixes PR kern/50330
Pullups will be requested for NetBSD-7, -6, -6-0, and -6-1
 1.305.4.1  18-May-2014  rmind sync with head
 1.305.2.2  03-Dec-2017  jdolecek update from HEAD
 1.305.2.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.308.8.1  05-Nov-2015  snj Pull up following revision(s) (requested by pgoyette in ticket #996):
sys/kern/kern_exec.c: revisions 1.419, 1.420
sys/kern/kern_exit.c: revisions 1.246, 1.247
sys/kern/kern_synch.c: revision 1.309
In execve_runproc(), update the p_waited entry for the process being
moved to SSTOP state, not for its parent. (It is correct to update
the parent's p_nstopchild count.) If the value is not already zero,
it could prevent its parent from waiting for the process.
Fixes PR kern/50298
--
When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.
Fixes PR kern/50318
--
Currently, if a process is exiting and its parent has indicated no intent
of reaping the process (nor any other children), the process wil get
reparented to init. Since the state of the exiting process at this point
is SDEAD, proc_reparent() will not update either the old or new parent's
p_nstopchild counters.
This change causes both old and new parents to be properly updated.
Fixes PR kern/50300
--
For processes marked with PS_STOPEXIT, update the process's p_waited
value, and update its parent's p_nstopchild value when marking the
process's p_stat to SSTOP. The process needed to be SACTIVE to get
here, so this transition represents an additional process for which
the parent needs to wait.
Fixes PR kern/50308
--
In spawn_return() we temporarily move the process state to SSTOP, but
without updating its p_waited value or its parent's p_nstopchild
counter. Later, we restore the original state, again without any
adjustment of the related values. This leaves a relatively short
window when the values are inconsistent and could interfere with the
proper operation of sys_wait() for the parent (if it manages to be
scheduled; it's not totally clear what, if anything, prevents
scheduling/execution of the parent).
If during this window, any of the checks being made result in an
error, we call exit1() which will eventually migrate the process's
state to SDEAD (with an intermediate transition to SDYING). At
this point the other variables get updated, and we finally restore
a consistent state.
This change updates the p_waited and parent's p_nstopchild at each
step to eliminate any windows during which the values could lead to
incorrect decisions.
Fixes PR kern/50330
 1.308.6.4  28-Aug-2017  skrll Sync with HEAD
 1.308.6.3  09-Jul-2016  skrll Sync with HEAD
 1.308.6.2  22-Apr-2016  skrll Sync with HEAD
 1.308.6.1  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.308.4.1  05-Nov-2015  snj Pull up following revision(s) (requested by pgoyette in ticket #996):
sys/kern/kern_exec.c: revisions 1.419, 1.420
sys/kern/kern_exit.c: revisions 1.246, 1.247
sys/kern/kern_synch.c: revision 1.309
In execve_runproc(), update the p_waited entry for the process being
moved to SSTOP state, not for its parent. (It is correct to update
the parent's p_nstopchild count.) If the value is not already zero,
it could prevent its parent from waiting for the process.
Fixes PR kern/50298
--
When clearing out the scheduler queues during system shutdown, we move
all processes to the SSTOP state. Make sure we update each process's
p_waited and the parents' p_nstopchild counters to maintain consistent
values. Should not make any real difference this late in the shutdown
process, but we should still be consistent just in case.
Fixes PR kern/50318
--
Currently, if a process is exiting and its parent has indicated no intent
of reaping the process (nor any other children), the process wil get
reparented to init. Since the state of the exiting process at this point
is SDEAD, proc_reparent() will not update either the old or new parent's
p_nstopchild counters.
This change causes both old and new parents to be properly updated.
Fixes PR kern/50300
--
For processes marked with PS_STOPEXIT, update the process's p_waited
value, and update its parent's p_nstopchild value when marking the
process's p_stat to SSTOP. The process needed to be SACTIVE to get
here, so this transition represents an additional process for which
the parent needs to wait.
Fixes PR kern/50308
--
In spawn_return() we temporarily move the process state to SSTOP, but
without updating its p_waited value or its parent's p_nstopchild
counter. Later, we restore the original state, again without any
adjustment of the related values. This leaves a relatively short
window when the values are inconsistent and could interfere with the
proper operation of sys_wait() for the parent (if it manages to be
scheduled; it's not totally clear what, if anything, prevents
scheduling/execution of the parent).
If during this window, any of the checks being made result in an
error, we call exit1() which will eventually migrate the process's
state to SDEAD (with an intermediate transition to SDYING). At
this point the other variables get updated, and we finally restore
a consistent state.
This change updates the p_waited and parent's p_nstopchild at each
step to eliminate any windows during which the values could lead to
incorrect decisions.
Fixes PR kern/50330
 1.311.10.2  23-Sep-2018  martin Pull up following revision(s) (requested by bouyer in ticket #1031):

sys/kern/kern_synch.c: revision 1.317

In mi_switch(), also call pserialize_switchpoint() if we're not switching
to another lwp, as proposed on
http://mail-index.netbsd.org/tech-kern/2018/07/20/msg023709.html

Without it, on a SMP machine with few processes running (e.g while
running sysinst), pserialize could hang for a long time until all
CPUs got a LWP to run (or, eventually, forever).

Tested on Xen domUs with 4 CPUs, and on a 64-threads AMD machine.
 1.311.10.1  26-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #573):
sys/kern/kern_synch.c: 1.314
Avoid a race condition between an LWP migration and curlwp_bind
curlwp_bind sets the LP_BOUND flag to l_pflags of the current LWP, which
prevents it from migrating to another CPU until curlwp_bindx is called.
Meanwhile, there are several ways that an LWP is migrated to another CPU and in
any cases the scheduler postpones a migration if a target LWP is running. One
example of LWP migrations is a load balancing; the scheduler periodically
explores CPU-hogging LWPs and schedule them to migrate (see sched_lwp_stats).
At that point the scheduler checks the LP_BOUND flag and if it's set to a LWP,
the scheduler doesn't schedule the LWP. A scheduled LWP is tried to be migrated
when it is leaving a running CPU, i.e., mi_switch. And mi_switch does NOT check
the LP_BOUND flag. So if an LWP is scheduled first and then it sets the
LP_BOUND flag, the LWP can be migrated regardless of the flag. To avoid this
race condition, we need to check the flag in mi_switch too.
For more details see
https://mail-index.netbsd.org/tech-kern/2018/02/13/msg023079.html
 1.314.2.4  26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.314.2.3  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.314.2.2  28-Jul-2018  pgoyette Sync with HEAD
 1.314.2.1  21-May-2018  pgoyette Sync with HEAD
 1.315.2.4  21-Apr-2020  martin Sync with HEAD
 1.315.2.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.315.2.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.315.2.1  10-Jun-2019  christos Sync with HEAD
 1.323.4.1  15-Oct-2019  martin Pull up following revision(s) (requested by kamil in ticket #320):

sys/kern/kern_synch.c: revision 1.324
sys/kern/kern_sig.c: revision 1.366
sys/kern/kern_exit.c: revision 1.277
sys/kern/kern_lwp.c: revision 1.204
sys/kern/sys_ptrace_common.c: revision 1.62

Separate flag for suspended by _lwp_suspend and suspended by a debugger

Once a thread was stopped with ptrace(2), userland process must not
be able to unstop it deliberately or by an accident.

This was a Windows-style behavior that makes threading tracing fragile.
 1.334.2.6  29-Feb-2020  ad Sync with head.
 1.334.2.5  25-Jan-2020  ad Sync with head.
 1.334.2.4  25-Jan-2020  ad Remove unintentional differences to base.
 1.334.2.3  23-Jan-2020  ad Back out previous.
 1.334.2.2  19-Jan-2020  ad Adaptive rwlocks proposed on tech-kern and working well on this branch
with vnode locks.
 1.334.2.1  17-Jan-2020  ad Sync with head.
 1.346.2.1  20-Apr-2020  bouyer Sync with HEAD

RSS XML Feed