History log of /src/sys/kern/subr_kcpuset.c |
Revision | | Date | Author | Comments |
1.20 |
| 23-Sep-2023 |
ad | Repply this change with a couple of bugs fixed:
- Do away with separate pool_cache for some kernel objects that have no special requirements and use the general purpose allocator instead. On one of my test systems this makes for a small (~1%) but repeatable reduction in system time during builds presumably because it decreases the kernel's cache / memory bandwidth footprint a little. - vfs_lockf: cache a pointer to the uidinfo and put mutex in the data segment.
|
1.19 |
| 12-Sep-2023 |
ad | Back out recent change to replace pool_cache with then general allocator. Will return to this when I have time again.
|
1.18 |
| 11-Sep-2023 |
martin | Add missing <sys/intr.h> include (previously indirectly hidden via pool.h)
|
1.17 |
| 10-Sep-2023 |
ad | - Do away with separate pool_cache for some kernel objects that have no special requirements and use the general purpose allocator instead. On one of my test systems this makes for a small (~1%) but repeatable reduction in system time during builds presumably because it decreases the kernel's cache / memory bandwidth footprint a little. - vfs_lockf: cache a pointer to the uidinfo and put mutex in the data segment.
|
1.16 |
| 01-Sep-2023 |
skrll | Trailing whitespace.
|
1.15 |
| 09-Apr-2023 |
riastradh | kern: KASSERT(A && B) -> KASSERT(A); KASSERT(B)
|
1.14 |
| 09-Apr-2022 |
riastradh | sys: Use membar_release/acquire around reference drop.
This just goes through my recent reference count membar audit and changes membar_exit to membar_release and membar_enter to membar_acquire -- this should make everything cheaper on most CPUs without hurting correctness, because membar_acquire is generally cheaper than membar_enter.
|
1.13 |
| 12-Mar-2022 |
riastradh | sys: Membar audit around reference count releases.
If two threads are using an object that is freed when the reference count goes to zero, we need to ensure that all memory operations related to the object happen before freeing the object.
Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one thread takes responsibility for freeing, but it's not enough to ensure that the other thread's memory operations happen before the freeing.
Consider:
Thread A Thread B obj->foo = 42; obj->baz = 73; mumble(&obj->bar); grumble(&obj->quux); /* membar_exit(); */ /* membar_exit(); */ atomic_dec -- not last atomic_dec -- last /* membar_enter(); */ KASSERT(invariant(obj->foo, obj->bar)); free_stuff(obj);
The memory barriers ensure that
obj->foo = 42; mumble(&obj->bar);
in thread A happens before
KASSERT(invariant(obj->foo, obj->bar)); free_stuff(obj);
in thread B. Without them, this ordering is not guaranteed.
So in general it is necessary to do
membar_exit(); if (atomic_dec_uint_nv(&obj->refcnt) != 0) return; membar_enter();
to release a reference, for the `last one out hit the lights' style of reference counting. (This is in contrast to the style where one thread blocks new references and then waits under a lock for existing ones to drain with a condvar -- no membar needed thanks to mutex(9).)
I searched for atomic_dec to find all these. Obviously we ought to have a better abstraction for this because there's so much copypasta. This is a stop-gap measure to fix actual bugs until we have that. It would be nice if an abstraction could gracefully handle the different styles of reference counting in use -- some years ago I drafted an API for this, but making it cover everything got a little out of hand (particularly with struct vnode::v_usecount) and I ended up setting it aside to work on psref/localcount instead for better scalability.
I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I only put it on things that look performance-critical on 5sec review. We should really adopt membar_enter_preatomic/membar_exit_postatomic or something (except they are applicable only to atomic r/m/w, not to atomic_load/store_*, making the naming annoying) and get rid of all the ifdefs.
|
1.12 |
| 26-Jul-2019 |
msaitoh | Set kcpuset's bit correctly to avoid undefined behavior. Found by KUBSan.
|
1.11 |
| 19-May-2014 |
rmind | branches: 1.11.28; Constify kcpuset_countset() and cpu_index() parameters.
|
1.10 |
| 25-Oct-2013 |
martin | branches: 1.10.2; Turn a few __unused into __diagused
|
1.9 |
| 17-Jul-2013 |
matt | Some constification. Add kcpuset_clone, kcpuset_insersection, kcpuset_remove, kcpuset_ffs, kcpuset_ffs_intersecting, kcpuset_atomicly_merge, kcpuset_atomicly_intersect, kcpuset_atomicly_remove
|
1.8 |
| 16-Sep-2012 |
rmind | branches: 1.8.2; 1.8.8; Rename kcpuset_copybits() to kcpuset_export_u32() and thus be more specific about the interface.
|
1.7 |
| 20-Aug-2012 |
rmind | branches: 1.7.2; kcpuset_copybits: fix potential endianness problem. Spotted by matt@.
|
1.6 |
| 06-Jun-2012 |
rmind | Few fixes for Xen: - cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes. - Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and xen_mcast_tlbflush() routines.
Tested by chs@.
|
1.5 |
| 20-Apr-2012 |
rmind | - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs) by Manuel Bouyer.
|
1.4 |
| 29-Jan-2012 |
rmind | branches: 1.4.2; - Add kcpuset_isotherset() and kcpuset_countset(). - Fix KC_NFIELDS_EARLY. Make kcpuset_isset() return bool.
|
1.3 |
| 07-Aug-2011 |
rmind | branches: 1.3.2; 1.3.6; - Add an argument to kcpuset_create() for zeroing. - Add kcpuset_atomic_set(), kcpuset_atomic_clear() and kcpuset_merge().
|
1.2 |
| 07-Aug-2011 |
rmind | Remove LW_AFFINITY flag and fix some bugs affinity mask handling.
|
1.1 |
| 07-Aug-2011 |
rmind | Add kcpuset(9) - a reworked dynamic CPU set implementation for kernel. Suitable for use during the early boot. MD and other implementations should be replaced with this interface.
Discussed on: tech-kern@
|
1.3.6.2 |
| 29-Apr-2012 |
mrg | sync to latest -current.
|
1.3.6.1 |
| 18-Feb-2012 |
mrg | merge to -current.
|
1.3.2.4 |
| 22-May-2014 |
yamt | sync with head.
for a reference, the tree before this commit was tagged as yamt-pagecache-tag8.
this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
|
1.3.2.3 |
| 30-Oct-2012 |
yamt | sync with head
|
1.3.2.2 |
| 23-May-2012 |
yamt | sync with head.
|
1.3.2.1 |
| 17-Apr-2012 |
yamt | sync with head
|
1.4.2.2 |
| 12-Jun-2012 |
riz | Pull up following revision(s) (requested by rmind in ticket #314): sys/arch/xen/x86/cpu.c: revision 1.92 sys/kern/subr_kcpuset.c: revision 1.6 sys/sys/kcpuset.h: revision 1.6 sys/arch/xen/x86/x86_xpmap.c: revision 1.44 Few fixes for Xen: - cpu_load_pmap: use atomic kcpuset(9) operations; fixes rare crashes. - Add kcpuset_copybits(9) and replace xen_kcpuset2bits(). Avoids incorrect ncpu problem in early boot. Also, micro-optimises xen_mcast_invlpg() and xen_mcast_tlbflush() routines. Tested by chs@.
|
1.4.2.1 |
| 09-May-2012 |
riz | Pull up following revision(s) (requested by rmind in ticket #202): sys/arch/x86/include/cpuvar.h: revision 1.46 sys/arch/xen/include/xenpmap.h: revision 1.34 sys/arch/i386/include/param.h: revision 1.77 sys/arch/x86/x86/pmap_tlb.c: revision 1.5 sys/arch/x86/x86/pmap_tlb.c: revision 1.6 sys/arch/i386/i386/genassym.cf: revision 1.92 sys/arch/xen/x86/cpu.c: revision 1.91 sys/arch/x86/x86/pmap.c: revision 1.177 sys/arch/xen/x86/xen_pmap.c: revision 1.21 sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31 sys/kern/subr_kcpuset.c: revision 1.5 sys/arch/amd64/include/param.h: revision 1.18 sys/sys/kcpuset.h: revision 1.5 sys/arch/x86/x86/mtrr_i686.c: revision 1.26 sys/arch/x86/x86/mtrr_i686.c: revision 1.27 sys/arch/xen/x86/x86_xpmap.c: revision 1.43 sys/arch/x86/x86/cpu.c: revision 1.98 sys/arch/amd64/amd64/mptramp.S: revision 1.14 sys/kern/sys_sched.c: revision 1.42 sys/arch/amd64/amd64/genassym.cf: revision 1.50 sys/arch/i386/i386/mptramp.S: revision 1.24 sys/arch/x86/include/pmap.h: revision 1.52 sys/arch/x86/include/cpu.h: revision 1.50 - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the limitation of maximum CPUs. - Support up to 256 CPUs on amd64 architecture by default. Bug fixes, improvements, completion of Xen part and testing on 64-core AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs) by Manuel Bouyer. - pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running. Do not reset tp_userpmap if pmap_kernel(). - Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes. - pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure. i686_mtrr_init_first: handle the case when there are no variable-size MTRR registers available (i686_mtrr_vcnt == 0).
|
1.7.2.2 |
| 20-Aug-2014 |
tls | Rebase to HEAD as of a few days ago.
|
1.7.2.1 |
| 20-Nov-2012 |
tls | Resync to 2012-11-19 00:00:00 UTC
|
1.8.8.1 |
| 23-Jul-2013 |
riastradh | sync with HEAD
|
1.8.2.2 |
| 18-May-2014 |
rmind | sync with head
|
1.8.2.1 |
| 28-Aug-2013 |
rmind | sync with head
|
1.10.2.1 |
| 10-Aug-2014 |
tls | Rebase.
|
1.11.28.1 |
| 13-Apr-2020 |
martin | Mostly merge changes from HEAD upto 20200411
|