Cross Reference: /src/sys/kern/subr

History log of /src/sys/kern/subr_pool.c
Revision	Date	Author	Comments
1.295	26-May-2025	bouyer	Never call pr_drain_hook from pool_allocator_alloc(). In the PR_WAITOK case it's called from pool_reclaim In the !PR_WAITOK case we're holding the pool lock and if the drain hook wants kernel_lock we may deadlock with another thread holding kernel_lock and calling pool_get(). Fixes PR kern/59411
1.294	16-May-2025	bouyer	Revert previous, requested by riastradh@ One possible fix for kern/59411 makes PR_GROWINGNOWAIT usefull again.
1.293	09-May-2025	bouyer	pool_grow(): The thread setting PR_GROWINGNOWAIT holds the pr_lock and should not release it before clearing PR_GROWINGNOWAIT because it's called with !PR_WAITOK. No other thread should see PR_GROWINGNOWAIT while holding pr_lock, so PR_GROWINGNOWAIT looks useless and can probably be removed. For now, only KASSERT that PR_GROWINGNOWAIT is never seeen, to make sure. Note that in the PR_GROWINGNOWAIT case we would exit/reenter pr_lock while we don't have PR_WAITOK, which is probably wrong too.
1.292	07-Dec-2024	chs	pool: fix pool_sethiwat() to actually do something The change that I made to the pool code back in April 2020 ("slightly change and fix the semantics of pool_set*wat()" ...) accidental broke pool_sethiwat() by making it have no effect. This was discovered after the crash reported in PR 58666 was fixed. The same machine (32-bit, with 10GB RAM) would hang due to the buffer cache causing the system to run out of kernel virtual space. The buffer cache uses a separate pool for buffer data for each power of 2 between DEV_BSIZE and MAXBSIZE, and if the usage pattern of buffer sizes changes then memory has to be moved between the different pools in order to create buffers of the new size. The buffer cache handles this by using pool_sethiwat() to cause memory freed from the buffer cache back to the pools to not be cached in the buffer cache pools but instead be freed back to the pools' back-end allocator (which allocates from the low-level kva allocator) as soon as possible. But since pool_sethiwat() wasn't doing anything, memory would stay cached in some buffer cache pools and starve other buffer cache pools (and a few other pools that do no use the kmem layer for memory allocation). Fix pool_sethiwat() to do what it is supposed to do again.
1.291	07-Dec-2024	chs	pool: use "big" (ie. > PAGE_SIZE) default allocators for more cases When I added the default "big" pool allocators back in 2017, I added them only for pool_caches and not plain pools, and only for IPL_NONE pool_caches at that. But these allocators work fine for for all pool caches and plain pools as well, so use them automatically by default when needed for all of those cases.
1.290	09-Apr-2023	riastradh	pool(9): Tweak branch prediction in pool_cache_get_paddr assertion. No functional change intended.
1.289	09-Apr-2023	riastradh	pool(9): Simplify assertion in pool_update_curpage. Add message while here.
1.288	09-Apr-2023	riastradh	kern: KASSERT(A && B) -> KASSERT(A); KASSERT(B)
1.287	24-Feb-2023	riastradh	kern: Eliminate most __HAVE_ATOMIC_AS_MEMBAR conditionals. I'm leaving in the conditional around the legacy membar_enters (store-before-load, store-before-store) in kern_mutex.c and in kern_lock.c because they may still matter: store-before-load barriers tend to be the most expensive kind, so eliding them is probably worthwhile on x86. (It also may not matter; I just don't care to do measurements right now, and it's a single valid and potentially justifiable use case in the whole tree.) However, membar_release/acquire can be mere instruction barriers on all TSO platforms including x86, so there's no need to go out of our way with a bad API to conditionalize them. If the procedure call overhead is measurable we just could change them to be macros on x86 that expand into __insn_barrier. Discussed on tech-kern: https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html
1.286	17-Feb-2023	skrll	Avoid undefined behaviour.
1.285	16-Jul-2022	simonb	branches: 1.285.4; Use 64-bit math to calculate pool sizes. Fixes overflow errors for pools larger than 4GB and gives the correct output for kernel pool pages in "vmstat -s" output.
1.284	29-May-2022	andvar	fix various typos in comments and log messages.
1.283	24-May-2022	andvar	fix various typos in comments, docs and log messages.
1.282	09-Apr-2022	riastradh	pool(9): Convert membar_exit to membar_release.
1.281	27-Feb-2022	riastradh	pool(9): Membar audit. - Use atomic_store_release and atomic_load_consume for associating a freshly constructed pool_cache with its underlying pool. The pool gets published in various ways before the pool cache is fully constructed. => Nix membar_sync -- no store-before-load is needed here. - Take pool_head_lock around sysctl kern.pool TAILQ_FOREACH. Then take a reference count, and drop the lock, around copyout. => Otherwise, pools could be partially initialized or freed while we're still trying to read from them -- and in the worst case, we might see a corrupted view of the tailq. => If we kept the lock around copyout, this could deadlock in memory allocation. => If we didn't take a reference count while releasing the lock, the pool could be destroyed while we're trying to traverse the list, sending us into oblivion instead of the next element.
1.280	24-Dec-2021	riastradh	pool(9): Fix default PR_NOALIGN for large pool caches. Was broken in recent change to separate some pool cache flags from pool flags. Fixes crash in zfs.
1.279	22-Dec-2021	thorpej	Do the last change differently: Instead of having a pre-destruct hook, put knowledge of passive serialization into the pool allocator directly, enabled by PR_PSERIALIZE when the pool / pool_cache is initialized. This will guarantee that a passive serialization barrier will be performed before the object's destructor is called, or before the page containing the object is freed back to the system (in the case of no destructor). Note that the internal allocator overhead is different when PR_PSERIALIZE is used (it implies PR_NOTOUCH, because the objects must remain in a valid state). In the DRM Linux API shim, this allows us to remove the custom page allocator for SLAB_TYPESAFE_BY_RCU.
1.278	21-Dec-2021	thorpej	Add pool_cache_setpredestruct(), which allows a pool cache to specify a function to be called before the destructor for a batch of one or more objects is called. This can be used as a synchronization point by subsystems that rely on the type-stable nature of pool cache objects or subsystems that use other forms of passive serialization.
1.277	25-Jul-2021	simonb	Add accessor functions to get the number of gets and puts on pools and pool caches.
1.276	24-Feb-2021	mrg	branches: 1.276.4; skip redzone on pools with the allocation (including all overhead) on anything greater than half the pool pagesize. this stops 4KiB being used per allocation from the kmem-02048 pool, and 64KiB per allocation from the buf32k pool. we're still wasting 1/4 of space for overhead on eg, the buf1k or kmem-01024 pools. however, including overhead costs, the amount of useless space (not used by consumer or overhead) reduces from 47% to 18%, so this is far less bad overall. there are a couple of ideas on solving this less ugly: - pool redzones are enabled with DIAGNOSTIC kernels, which is defined as being "fast, cheap". this is not cheap (though it is relatively fast if you don't run out of memory) so it does not really belong here as is, but DEBUG or a special option would work for it. - if we increase the "pool page" size for these pools, such that the overhead over pool page is reduced to 5% or less, we can have redzones for more allocations without using more space. also, see this thread: https://mail-index.netbsd.org/tech-kern/2021/02/23/msg027130.html
1.275	19-Dec-2020	mrg	ddb: add two new modifiers to "show pool" and "show all pools" - /s shows a short single-line per pool list (the normal output is about 10 lines per.) - /S skips pools with zero allocations.
1.274	05-Sep-2020	riastradh	branches: 1.274.2; Suppress pool redzone message unless booted with debug.
1.273	19-Jun-2020	jdolecek	bump the limit on max item size for pool_init()/pool_cache_init() up to 1 << 24, so that the pools can be used for ZFS block allocations, which are up to SPA_MAXBLOCKSHIFT (1 << 24) part of PR kern/55397 by Frank Kardel
1.272	14-Jun-2020	ad	Arithmetic error in previous.
1.271	14-Jun-2020	ad	pool_cache: - make all counters per-CPU and make cache layer do its work with atomic ops. - conserve memory by caching empty groups globally.
1.270	07-Jun-2020	maxv	Add fault(4).
1.269	06-Jun-2020	maxv	kMSan: re-set the orig after pool_cache_get_slow(), using the address of the caller of pool_cache_get_paddr(). Otherwise the orig is just pool_cache_get_paddr(), and that's not really useful for debugging.
1.268	15-Apr-2020	maxv	Introduce POOL_NOCACHE, simple option to cancel pool_caches and go directly to the pool layer. It is taken out of POOL_QUARANTINE. Advertise POOL_NOCACHE for kMSan rather than POOL_QUARANTINE. With kMSan we are only interested in the no-caching effect, not the quarantine. This reduces memory pressure on kMSan kernels.
1.267	13-Apr-2020	chs	slightly change and fix the semantics of pool_setwat(), pool_sethardlimit() and pool_prime() (and their pool_cache_ counterparts): - the pool_set*wat() APIs are supposed to specify thresholds for the count of free items in the pool before pool pages are automatically allocated or freed during pool_get() / pool_put(), whereas pool_sethardlimit() and pool_prime() are supposed to specify minimum and maximum numbers of total items in the pool (both free and allocated). these were somewhat conflated in the existing code, so separate them as they were intended. - change pool_prime() to take an absolute number of items to preallocate rather than an increment over whatever was done before, and wait for any memory allocations to succeed. since pool_prime() can no longer fail after this, change its return value to void and adjust all callers. - pool_setlowat() is documented as not immediately attempting to allocate any memory, but it was changed some time ago to immediately try to allocate up to the lowat level, so just fix the manpage to describe the current behaviour. - add a pool_cache_prime() to complete the API set.
1.266	08-Feb-2020	maxv	branches: 1.266.4; Retire KLEAK. KLEAK was a nice feature and served its purpose; it allowed us to detect dozens of info leaks on the kernel->userland boundary, and thanks to it we tackled a good part of the infoleak problem 1.5 years ago. Nowadays however, we have kMSan, which can detect uninitialized memory in the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to detect, but in addition, (1) it operates in all of the kernel and not just the kernel->userland boundary, (2) it requires no user interaction, and (3) it is deterministic and not statistical. That makes kMSan the feature of choice to detect info leaks nowadays; people interested in detecting info leaks should boot a kMSan kernel and just wait for the magic to happen. KLEAK was a good ride, and a fun project, but now is time for it to go. Discussed with several people, including Thomas Barabosch.
1.265	19-Jan-2020	chs	fix assertions about when it is ok for pool_get() to return NULL.
1.264	27-Dec-2019	maxv	branches: 1.264.2; Switch to panic, and make the message more useful.
1.263	03-Dec-2019	riastradh	Use __insn_barrier to enforce ordering in l_ncsw loops. (Only need ordering observable by interruption, not by other CPUs.)
1.262	14-Nov-2019	maxv	Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized memory used by the kernel at run time, and just like kASan and kCSan, it is an excellent feature. It has already detected 38 uninitialized variables in the kernel during my testing, which I have since discreetly fixed. We use two shadows: - "shad", to track uninitialized memory with a bit granularity (1:1). Each bit set to 1 in the shad corresponds to one uninitialized bit of real kernel memory. - "orig", to track the origin of the memory with a 4-byte granularity (1:1). Each uint32_t cell in the orig indicates the origin of the associated uint32_t of real kernel memory. The memory consumption of these shadows is consequent, so at least 4GB of RAM is recommended to run kMSan. The compiler inserts calls to specific __msan_* functions on each memory access, to manage both the shad and the orig and detect uninitialized memory accesses that change the execution flow (like an "if" on an uninitialized variable). We mark as uninit several types of memory buffers (stack, pools, kmem, malloc, uvm_km), and check each buffer passed to copyout, copyoutstr, bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory that leaves the system. This allows us to detect kernel info leaks in a way that is more efficient and also more user-friendly than KLEAK. Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot tolerate having one non-instrumented function, because this could cause false positives. kMSan cannot instrument ASM functions, so I converted most of them to __asm__ inlines, which kMSan is able to instrument. Those that remain receive special treatment. Contrary to kASan again, kMSan uses a TLS, so we must context-switch this TLS during interrupts. We use different contexts depending on the interrupt level. The orig tracks precisely the origin of a buffer. We use a special encoding for the orig values, and pack together in each uint32_t cell of the orig: - a code designating the type of memory (Stack, Pool, etc), and - a compressed pointer, which points either (1) to a string containing the name of the variable associated with the cell, or (2) to an area in the kernel .text section which we resolve to a symbol name + offset. This encoding allows us not to consume extra memory for associating information with each cell, and produces a precise output, that can tell for example the name of an uninitialized variable on the stack, the function in which it was pushed on the stack, and the function where we accessed this uninitialized variable. kMSan is available with LLVM, but not with GCC. The code is organized in a way that is similar to kASan and kCSan, so it means that other architectures than amd64 can be supported.
1.261	16-Oct-2019	christos	Add and use __FPTRCAST, requested by uwe@
1.260	16-Oct-2019	christos	Add void * function pointer casts. There are different ways to "fix" those warnings: 1. this one: add a void * cast (which I think is the least intrusive) 2. add pragmas to elide the warning 3. add intermediate inline conversion functions 4. change the called function prototypes, adding unused arguments and converting some of the pointer arguments to void *. 5. make the functions varyadic (which defeats the purpose of checking) 6. pass command line flags to elide the warning I did try 3 and 4 and I was not pleased with the result (sys_ptrace_common.c) (3) added too much code and defines, and (4) made the regular use clumsy.
1.259	23-Sep-2019	skrll	Enable POOL_REDZONE with DIAGNOSTIC. The bug in the arm pmap was fixed long ago.
1.258	06-Sep-2019	maxv	Reorder for clarity, and localify pool_allocator_big[], should not be used outside.
1.257	26-Aug-2019	maxv	Revert r1.254, put back \|\| for KASAN, some destructors like lwp_dtor() caused false positives. Needs more work.
1.256	17-Aug-2019	maxv	Kernel Heap Hardening: use bitmaps on all off-page pools. This migrates 29 MI pools on amd64 from linked lists to bitmaps, which have higher security properties. Then, change the computation of the size of the PH pools: take into account the bitmap area available by default in the ph_u2 union, and don't go with &phpool[>0] if &phpool[0] already has enough space to embed a bitmap. The pools that are migrated in this change all use bitmaps small enough to fit in &phpool[0], therefore there is no increase in memory consumption.
1.255	16-Aug-2019	maxv	Initialize pp->pr_redzone to false. For some reason with KUBSAN GCC does not eliminate the unused branch in pr_item_linkedlist_put(), and this leads to a unused uninitialized access which triggers KUBSAN messages.
1.254	03-Aug-2019	maxv	Replace \|\| by && in KASAN, to increase the pool coverage. Strictly speaking, what we want to avoid is poisoning buffers that were referenced in a global list as part of the ctor. But, if a buffer indeed got referenced as part of the ctor, it necessarily has to be unreferenced in the dtor; which implies it has to have a dtor. So we want both a ctor and a dtor, and not just one of them. Note that POOL_QUARANTINE already implicitly provides this increased coverage.
1.253	02-Aug-2019	maxv	Kernel Heap Hardening: perform certain sanity checks on the pool caches directly, to immediately detect certain bugs that would otherwise have been detected only later on the pool layer, if the buffer ever reached the pool layer.
1.252	29-Jun-2019	maxv	branches: 1.252.2; The big pool allocators use pool_page_alloc(), which allocates page-aligned storage. So if we switch to a big pool, set PR_NOALIGN, because the address of the storage is not aligned to the item size. Should fix PR/54319.
1.251	13-Jun-2019	christos	make pool assertion messages consistent.
1.250	09-May-2019	skrll	Avoid KASSERT(!cpu_intr_p()) when breaking into ddb and issuing show uvmexp
1.249	13-Apr-2019	maxv	Introduce POOL_QUARANTINE, a feature that creates a window during which a freed buffer cannot be reallocated. This greatly helps detecting use-after-frees, because they are not short-lived anymore. We maintain a per-pool fifo of 128 buffers. On each pool_put, we do a real free of the oldest buffer, and insert the new buffer. Before insertion, we mark the buffer as invalid with KASAN. On each pool_cache_put, we destruct the object, so it lands in pool_put, and the quarantine is handled there. POOL_QUARANTINE can be used in conjunction with KASAN to detect more use-after-free bugs.
1.248	07-Apr-2019	maxv	Provide a code argument in kasan_mark(), and give a code to each caller. Five codes used: GenericRedZone, MallocRedZone, KmemRedZone, PoolRedZone, and PoolUseAfterFree. This can greatly help debugging complex memory corruptions.
1.247	07-Apr-2019	maxv	Fix tiny race in pool+KASAN, that resulted in occasional false positives. We were uselessly marking already valid areas as valid. When doing that, our KASAN code emits two calls to kasan_markmem, and there is a very small window where the area becomes invalid. So, if the area happens to be already globally referenced, and if another thread happens to read the buffer via this reference, we get a false positive. This happens only with pool_caches that have a pc_ctor that creates a global reference to the buffer, and there is one single pool_cache that does that: 'file_cache'. So now, two changes: - In pool_cache_get_slow(), the pool_get() has already redzoned the object, so no need to call pool_redzone_fill(). - In pool_cache_destruct_object1(), don't re-mark the object. If there is no ctor pool_put is fine with already-invalid objects, if there is a ctor the object was not marked as invalid in the first place; so in either case, the re-marking is not needed. Fixes PR/53674. Although very rare and difficult to reproduce, a local quarantine patch of mine made the false positives recurrent.
1.246	28-Mar-2019	maxv	Move pnbuf_cache into vfs_init.c, where it belongs.
1.245	27-Mar-2019	maxv	Kernel Heap Hardening: detect frees-in-wrong-pool on on-page pools. The detection is already implicitly done for off-page pools. We recycle pr_slack (unused) in struct pool, and make ph_node a union in order to recycle an unsigned int in struct pool_item_header. Each time a pool is created we atomically increase a global counter, and register the current value in pp. We then propagate this value in each ph, and ensure they match in pool_put. This can catch several classes of kernel bugs and basically makes them unexploitable. It comes with no increase in memory usage and no measurable increase in CPU cost (inexistent cost actually, just one check predicted false).
1.244	26-Mar-2019	maxv	Remove POOL_SUBPAGE, it is unused, undocumented, and adds confusion.
1.243	18-Mar-2019	maxv	Kernel Heap Hardening: manage freed items with bitmaps rather than linked lists when we're on-page and the page header is naturally big enough to contain a bitmap. This comes with no increase in memory consumption, and similar CPU cost (maybe it's a little faster actually). We want to favor bitmaps over linked lists, because linked lists install kernel pointers inside the items, and this can be too easily exploitable in use-after-free or double-free conditions, or in item buffer overflows occurring within a pool page.
1.242	17-Mar-2019	maxv	Introduce a new flag, PR_USEBMAP, that indicates whether the pool uses a bitmap to manage freed items. It dissociates PR_NOTOUCH from bitmaps, but for now is set only when PR_NOTOUCH is set, which reproduces the current behavior. Therefore, no functional change. Also clarify the code.
1.241	17-Mar-2019	maxv	Kernel Heap Hardening: put the pool header at the beginning of the backing page, not at the end of it. This makes it harder to exploit buffer overflows, because it eliminates the certainty that sensitive kernel data is located after the item space and is therefore overwritable. The pr_itemoffset field is recycled, and holds the (aligned) offset of the item space. The pr_phoffset field becomes unused. We align 'itemspace' for clarity, but it's not strictly necessary. This comes with no performance cost or increase in memory usage, in particular the potential padding consumed by roundup(PHSIZE, align) was already implicitly consumed before, because of the (necessary) truncations in the divisions. Now it's just more explicit, but not bigger.
1.240	17-Mar-2019	maxv	Move some code into a separate function, and explain a bit. Also define PHSIZE. No functional change.
1.239	17-Mar-2019	maxv	cosmetic
1.238	17-Mar-2019	maxv	Prepare the removal of the 'ioff' argument: add a KASSERT to ensure it is zero, and remove the internal logic. The pool code is simpler now.
1.237	16-Mar-2019	maxv	Misc changes: - Turn two KASSERTs to real panics, they are useful and not expensive. - Rename a few variables for clarity. - Add a new panic, to make sure a freed item is in the item space.
1.236	13-Mar-2019	maxv	style
1.235	11-Mar-2019	maxv	Add sanity check: make sure we retrieve a valid item header, by checking its page address against the one we computed. If there's a mismatch it means the buffer does not belong to the pool, and we panic.
1.234	11-Mar-2019	maxv	Rename pr_item_notouch_* to pr_item_bitmap_, and move some code into new pr_item_linkedlist_ functions. This makes it easier to see that we have two ways of handling freed items. No functional change.
1.233	11-Feb-2019	maxv	Fix previous, pr_size includes the KASAN redzone. Repurpose pr_reqsize and use it for PR_ZERO, it holds the size requested by the user with no padding or redzone added, and only these bytes should be zeroed.
1.232	10-Feb-2019	christos	Introduce PR_ZERO to avoid open-coding memset()s everywhere. OK riastradh@.
1.231	23-Dec-2018	maxv	Simplify the KASAN API, use only kasan_mark() and explain briefly. The alloc/free naming was too confusing.
1.230	23-Dec-2018	maxv	Remove useless debugging code, the area is completely filled but it's not checked afterwards, only pi_magic is.
1.229	16-Dec-2018	maxv	Add support for detecting use-after-frees in KASAN. We poison each freed buffer, any subsequent read or write will be detected as illegal. * Add POOL_CHECK_MAGIC, which is disabled under KASAN, because the same detection is done in a better way. * Register the size+redzone in the pool structure, to reduce the overhead. * Fix the CTOR/DTOR check in KLEAK, the fields are never NULL.
1.228	02-Dec-2018	maxv	Introduce KLEAK, a new feature that can detect kernel information leaks. It works by tainting memory sources with marker values, letting the data travel through the kernel, and scanning the kernel<->user frontier for these marker values. Combined with compiler instrumentation and rotation of the markers, it is able to yield relevant results with little effort. We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is supported on amd64 only for now, but it is not complicated to add more architectures (just a matter of having the address of .text, and a stack unwinder). A userland tool is provided, that allows to execute a command in rounds and monitor the leaks generated all the while. KLEAK already detected directly 12 kernel info leaks, and prompted changes that in total fixed 25+ leaks. Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer FKIE).
1.227	10-Sep-2018	maxv	Correctly align the size+redzone for KASAN, on amd64 it happens to be always 8byte-aligned but on other architectures it may not be.
1.226	25-Aug-2018	maxv	Disable POOL_REDZONE until we figure out what's wrong. There must be a dumb problem, that is not triggerable on amd64.
1.225	24-Aug-2018	maxv	Use __predict_false to optimize, and also replace panic->printf.
1.224	23-Aug-2018	maxv	Add kASan redzones on pools and pool_caches. Also enable POOL_REDZONE on DIAGNOSTIC.
1.223	04-Jul-2018	kamil	Avoid undefined behavior in pr_item_notouch_put() Do not left shift a signed integer changing its signedness bit. sys/kern/subr_pool.c:251:30, left shift of 1 by 31 places cannot be represented in type 'int' Detected with Kernel Undefined Behavior Sanitizer. Reported by <Harry Pantazis>
1.222	04-Jul-2018	kamil	Avoid Undefined Behavior in pr_item_notouch_get() Change the type of left shifted integer from signed to unsigned. sys/kern/subr_pool.c:274:13, left shift of 1 by 31 places cannot be represented in type 'int' Detected with Kernel Undefined Behavior Sanitizer. Reported by <Harry Pantazis>
1.221	12-Jan-2018	para	branches: 1.221.2; 1.221.4; fix comment pool stats are listed 'vmstat -m' not 'vmstat -i'
1.220	29-Dec-2017	christos	Don't release the lock in the PR_NOWAIT allocation. Move flags setting after the acquiring the mutex. (from Tobias Nygren)
1.219	16-Dec-2017	mrg	hopefully workaround the irregularly "fork fails in init" problem. if a pool is growing, and the grower is PR_NOWAIT, mark this. if another caller wants to grow the pool and is also PR_NOWAIT, busy-wait for the original caller, which should either succeed or hard-fail fairly quickly. implement the busy-wait by unlocking and relocking this pools mutex and returning ERESTART. other methods (such as having the caller do this) were significantly more code and this hack is fairly localised. ok chs@ riastradh@
1.218	04-Dec-2017	mrg	properly account PR_RECURSIVE pools like vmstat does.
1.217	02-Dec-2017	mrg	add two new members to uvmexp_sysctl{}: bootpages and poolpages. bootpages is set to the pages allocated via uvm_pageboot_alloc(). poolpages is calculated from the list of pools nr_pages members. this brings us closer to having a valid total of pages known by the system, vs actual pages originally managed. XXX: poolpages needs some handling for PR_RECURSIVE pools still.
1.216	14-Nov-2017	christos	- fix an assert; we can reach there if we are nowait or limitfail. - when priming the pool and failing with ERESTART, don't decrement the number of pages; this avoids the issue of returning an ERESTART when we get to 0, and is more correct. - simplify the pool_grow code, and don't wakeup things if we ENOMEM.
1.215	09-Nov-2017	christos	Add assertions that either PR_WAITOK or PR_NOWAIT are set.
1.214	09-Nov-2017	christos	Handle the ERESTART case from pool_grow()
1.213	09-Nov-2017	christos	make the KASSERTMSG/panic strings consistent as '%s: [%s], __func__, wchan'
1.212	09-Nov-2017	christos	Since pr_lock is now used to wait for two things now (PR_GROWING and PR_WANTED) we need to loop for the condition we wanted.
1.211	06-Nov-2017	riastradh	Assert that pool_get failure happens only with PR_NOWAIT. This would have caught the mistake I made last week leading to null pointer dereferences all over the place, a mistake which I evidently poorly scheduled alongside maxv's change to the panic message on x86 for null pointer dereferences.
1.210	05-Nov-2017	mlelstv	pool_grow can now fail even when sleeping is ok. Catch this case in pool_get and retry.
1.209	28-Oct-2017	riastradh	Allow only one pending call to a pool's backing allocator at a time. Candidate fix for problems with hanging after kva fragmentation related to PR kern/45718. Proposed on tech-kern: https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html Tested by bouyer@ on i386. This makes one small change to the semantics of pool_prime and pool_setlowat: they may fail with EWOULDBLOCK instead of ENOMEM, if there is a pending call to the backing allocator in another thread but we are not actually out of memory. That is unlikely because nearly always these are used during initialization, when the pool is not in use. XXX pullup-8 XXX pullup-7 XXX pullup-6 (requires tweaking the patch) XXX pullup-5...
1.208	08-Jun-2017	chs	add some pool_allocators for pool item sizes larger than PAGE_SIZE. needed by dtrace.
1.207	14-Mar-2017	riastradh	branches: 1.207.6; #if DIAGNOSTIC panic ---> KASSERT - Omit mutex_exit before panic. No need. - Sprinkle some more information into a few messages. - Prefer __diagused over #if DIAGNOSTIC for declarations, to reduce conditionals. ok mrg@
1.206	05-Feb-2016	knakahara	branches: 1.206.2; 1.206.4; fix: "vmstat -C" CpuLayer showed only the last cpu values.
1.205	24-Aug-2015	pooka	to garnish, dust with _KERNEL_OPT
1.204	28-Jul-2015	maxv	Introduce POOL_REDZONE.
1.203	13-Jun-2014	joerg	branches: 1.203.2; 1.203.4; Add kern.pool for memory pool stats.
1.202	26-Apr-2014	abs	Ensure pool_head is non static - for "vmstat -i"
1.201	17-Feb-2014	para	branches: 1.201.2; replace vmem(9) custom boundary tag allocation with a pool(9)
1.200	11-Mar-2013	pooka	branches: 1.200.6; In pool_cache_put_slow(), pool_get() can block (it does mutex_enter()), so we need to retry if curlwp took a context switch during the call. Otherwise, CPU-local invariants can get screwed up: panic: kernel diagnostic assertion "cur->pcg_avail == cur->pcg_size" failed This is (was) very easy to reproduce by just running: while : ; do RUMP_NCPU=32 ./a.out ; done where a.out only calls rump_init(). But, any situation there's contention and a pool doesn't have emptygroups would do.
1.199	09-Feb-2013	christos	printflike maintenance.
1.198	28-Aug-2012	christos	branches: 1.198.2; proper locking for DEBUG
1.197	05-Jun-2012	jym	Now that pool_cache_invalidate() is synchronous and can handle per-CPU caches, merge together pool_drain_start() and pool_drain_end() into bool pool_drain(struct pool **ppp); "bool" value indicates whether reclaiming was fully done (true) or not (false) "ppp" will contain a pointer to the pool that was drained (optional). See http://mail-index.netbsd.org/tech-kern/2012/06/04/msg013287.html
1.196	05-Jun-2012	jym	As pool reclaiming is unlikely to happen at interrupt or softint context, re-enable the portion of code that allows invalidation of CPU-bound pool caches. Two reasons: - CPU cached objects being invalidated, the probability of fetching an obsolete object from the pool_cache(9) is greatly reduced. This speeds up pool_cache_get() quite a bit as it does not have to keep destroying objects until it finds an updated one when an invalidation is in progress. - for situations where we have to ensure that no obsolete object remains after a state transition (canonical example: pmap mappings between Xen VM restoration), invalidating all pool_cache(9) is the safest way to go. As it uses xcall(9) to broadcast the execution of pool_cache_transfer(), pool_cache_invalidate() cannot be called from interrupt or softint context (scheduling a xcall(9) can put a LWP to sleep). pool_cache_xcall() => pool_cache_transfer() to reflect its use. Invalidation being a costly process (1000s objects may be destroyed), all places where pool_cache_invalidate() may be called from interrupt/softint context will now get caught by the proper KASSERT(), and fixed. Ping me when you see one. Tested under i386 and amd64 by running ATF suite within 64MiB HVM domains (tried triggering pgdaemon a few times). No objection on tech-kern@. XXX a similar fix has to be pulled up to NetBSD-6, but with a more conservative approach. See http://mail-index.netbsd.org/tech-kern/2012/05/29/msg013245.html
1.195	05-May-2012	rmind	G/C POOL_DIAGNOSTIC option. No objection on tech-kern@.
1.194	04-Feb-2012	para	branches: 1.194.2; make acorn26 compile by fixing up subpage pool allocations ok: riz@
1.193	29-Jan-2012	he	Use the same style for initialization of pool_allocator_kmem under POOL_SUBPAGE as all the other poll_allocator structs. Fixes build problem for acorn26.
1.192	28-Jan-2012	rmind	pool_page_alloc, pool_page_alloc_meta: avoid extra compare, use const. ffs_mountfs,sys_swapctl: replace memset with kmem_zalloc. sys_swapctl: move kmem_free outside the lock path. uvm_init: fix comment, remove pointless numeration of steps. uvm_map_enter: remove meflagval variable. Fix some indentation.
1.191	27-Jan-2012	para	extending vmem(9) to be able to allocated resources for it's own needs. simplifying uvm_map handling (no special kernel entries anymore no relocking) make malloc(9) a thin wrapper around kmem(9) (with private interface for interrupt safety reasons) releng@ acknowledged
1.190	27-Sep-2011	jym	branches: 1.190.2; 1.190.6; Modify *ASSERTMSG() so they are now used as variadic macros. The main goal is to provide routines that do as KASSERT(9) says: append a message to the panic format string when the assertion triggers, with optional arguments. Fix call sites to reflect the new definition. Discussed on tech-kern@. See http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html
1.189	22-Mar-2011	pooka	pnbuf_cache is used all over the place outside of vfs, so put it in one place to avoid many definitions.
1.188	17-Jan-2011	uebayasi	Fix a conditional include.
1.187	17-Jan-2011	uebayasi	Include internal definitions (uvm/uvm.h) only where necessary.
1.186	03-Jun-2010	pooka	branches: 1.186.2; Report result of pool_reclaim() from pool_drain_end().
1.185	12-May-2010	rmind	pool_{cache_}get: improve previous diagnostic by checking for panicstr, so it wont trigger the assert while trying to dump core on crash.
1.184	12-May-2010	rmind	- Sprinkle asserts to catch calls from interrupt context on IPL_NONE pools. - Add diagnostic drain attempt.
1.183	25-Apr-2010	ad	MAXCPUS -> __arraycount
1.182	20-Jan-2010	rmind	branches: 1.182.2; 1.182.4; pool_cache_invalidate: comment out invalidation of per-CPU caches (nobody depends on it, at the moment) until we decide how to fix it (xcall(9) cannot be used from interrupt context). XXX: Perhaps implement XC_HIGHPRI.
1.181	03-Jan-2010	mlelstv	drop __predict micro optimization in pool_init for cleaner code.
1.180	03-Jan-2010	mlelstv	Pools are created way before the pool subsystem mutexes are initialized. Ignore also pool_allocator_lock while the system is in cold state. When the system has left cold state, uvm_init() should have also initialized the pool subsystem and the mutexes are ready to use.
1.179	02-Jan-2010	mlelstv	Move initialization of pool_allocator_lock before its first use. This failed on archs where a mutex isn't initialized to a zero value. Defer allocation of pool log to the logging action, if allocation fails, it will be retried the next time something is logged. Clear pool log on allocation so that ddb doesn't crash when showing so far unused log entries.
1.178	30-Dec-2009	elad	Turn PA_INITIALIZED to a reference count for the pool allocator, and once it drops to zero destroy the mutex we initialize. This fixes the problem mentioned in http://mail-index.netbsd.org/tech-kern/2009/12/28/msg006727.html Also remove pa_flags now that it's no longer needed. Idea from matt@, okay matt@.
1.177	20-Oct-2009	jym	Fix a bug where on MP systems, pool_cache_invalidate(9) could be called early during boot, just after CPUs are attached but before they are marked as running. This will result in a list of CPUs without the SPCF_RUNNING flag set, and will trigger the 'KASSERT(xc_tailp < xc_headp)' in xc_lowpri() as no cross call is issued. Bug reported and patch tested by tron@. See also http://mail-index.netbsd.org/tech-kern/2009/10/19/msg006293.html
1.176	15-Oct-2009	thorpej	- pool_cache_invalidate(): broadcast a cross-call to drain the per-CPU caches before draining the global cache. - pool_cache_invalidate_local(): remove.
1.175	08-Oct-2009	jym	Add pool_cache_invalidate_local() to the pool_cache(9) API, to permit per-CPU objects invalidation when cached in the pool cache. See http://mail-index.netbsd.org/tech-kern/2009/10/05/msg006206.html . Reviewed by bouyer@. Thanks!
1.174	13-Sep-2009	pooka	Wipe out the last vestiges of POOL_INIT with one swift stroke. In most cases, use a proper constructor. For proplib, give a local equivalent of POOL_INIT for the kernel object implementation. This way the code structure can be preserved, and a local link set is not hazardous anyway (unless proplib is split to several modules, but that'll be the day). tested by booting a kernel in qemu and compile-testing i386/ALL
1.173	29-Aug-2009	rmind	Make pool_head static.
1.172	15-Apr-2009	yamt	pool_cache_put_paddr: add an assertion.
1.171	11-Nov-2008	ad	branches: 1.171.4; Avoid recursive mutex_enter() when the system is low on KVA. Should fix crash reported by riz on current-users.
1.170	15-Oct-2008	ad	branches: 1.170.2; 1.170.4; - Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of interest to MI code. No functional change. - Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl shouldn't print confused output.
1.169	11-Aug-2008	yamt	make pcg_dummy const to catch bugs earlier.
1.168	11-Aug-2008	yamt	add some KASSERTs.
1.167	08-Aug-2008	skrll	Comment whitespace.
1.166	09-Jul-2008	yamt	pool_do_put: fix a pool corruption bug discovered by the recent exec_pool changes.
1.165	07-Jul-2008	yamt	branches: 1.165.2; fix pool corruption bugs in subr_pool.c 1.162.
1.164	04-Jul-2008	ad	Move an assignment later.
1.163	04-Jul-2008	ad	- Keep cache locked while allocating a cache group - later we might want to automatically tune the group sizes at run time. - Fix broken assertion. - Avoid another test+branch.
1.162	04-Jul-2008	ad	Remove a bunch of conditional branches from the pool_cache fast path.
1.161	31-May-2008	ad	branches: 1.161.2; Use __noinline.
1.160	28-Apr-2008	martin	branches: 1.160.2; Remove clause 3 and 4 from TNF licenses
1.159	28-Apr-2008	ad	Add MI code to support in-kernel preemption. Preemption is deferred by one of the following: - Holding kernel_lock (indicating that the code is not MT safe). - Bracketing critical sections with kpreempt_disable/kpreempt_enable. - Holding the interrupt priority level above IPL_NONE. Statistics on kernel preemption are reported via event counters, and where preemption is deferred for some reason, it's also reported via lockstat. The LWP priority at which preemption is triggered is tuneable via sysctl.
1.158	27-Apr-2008	ad	branches: 1.158.2; - Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable. DragonflyBSD uses the crit names for something quite different. - Add a kpreempt_disabled function for diagnostic assertions. - Add inline versions of kpreempt_enable/kpreempt_disable for primitives. - Make some more changes for preemption safety to the x86 pmap.
1.157	24-Apr-2008	ad	Merge the socket locking patch: - Socket layer becomes MP safe. - Unix protocols become MP safe. - Allows protocol processing interrupts to safely block on locks. - Fixes a number of race conditions. With much feedback from matt@ and plunky@.
1.156	27-Mar-2008	ad	branches: 1.156.2; Replace use of CACHE_LINE_SIZE in some obvious places.
1.155	17-Mar-2008	ad	Make them compile again.
1.154	17-Mar-2008	yamt	- simplify ASSERT_SLEEPABLE. - move it from proc.h to systm.h. - add some more checks. - make it a little more lkm friendly.
1.153	10-Mar-2008	martin	Use cpu index instead of the machine dependend, not very expressive cpuid when naming user-visible kernel entities.
1.152	02-Mar-2008	yamt	pool_do_put: remove pa_starved_p check for now as it seems to cause more problems than it solves. PR/37993 from Greg A. Woods.
1.151	14-Feb-2008	yamt	branches: 1.151.2; 1.151.6; use time_uptime instead of getmicrotime() for ph_time.
1.150	05-Feb-2008	skrll	Revert previous as requested by yamt.
1.149	02-Feb-2008	skrll	Check alignment against pp->pr_align not pp->pr_alloc->pa_pagesz. DIAGNOSTIC kernels on hppa boot again. OK'd by ad.
1.148	28-Jan-2008	yamt	pool_cache_get_paddr: don't bother to clear pcgo_va unless DIAGNOSTIC.
1.147	04-Jan-2008	ad	Start detangling lock.h from intr.h. This is likely to cause short term breakage, but the mess of dependencies has been regularly breaking the build recently anyhow.
1.146	02-Jan-2008	ad	Merge vmlocking2 to head.
1.145	26-Dec-2007	ad	Merge more changes from vmlocking2, mainly: - Locking improvements. - Use pool_cache for more items.
1.144	22-Dec-2007	yamt	pool_in_cg: don't bother to check slots past pcg_avail.
1.143	22-Dec-2007	yamt	pool_whatis: print cached items as well.
1.142	20-Dec-2007	ad	- Support two different sizes of pool_cache group. The default has 14 or 15 items, and the new large groups (for busy caches) have 62 or 63 items. - Add PR_LARGECACHE flag as a hint that a pool_cache should use large groups. This should be eventually be tuned at runtime. - Report group size for vmstat -C.
1.141	13-Dec-2007	yamt	add ddb "whatis" command. inspired from solaris ::whatis dcmd.
1.140	13-Dec-2007	yamt	don't forget to initialize ph_off for PR_NOTOUCH.
1.139	11-Dec-2007	ad	Change the ncpu test to work when a pool_cache or softint is initialized between mi_cpu_attach() and attachment of the boot CPU. Suggested by mrg@.
1.138	05-Dec-2007	ad	branches: 1.138.2; 1.138.4; pool_init, pool_cache_init: hack around IP input processing which can not yet safely block without severely confusing soo_write() and friends. If the pool's IPL is IPL_SOFTNET, initialize the mutex at IPL_VM so that it's a spinlock. To be dealt with correctly in the near future.
1.137	18-Nov-2007	ad	branches: 1.137.2; Work around issues with pool_cache on sparc.
1.136	14-Nov-2007	yamt	fix freecheck.
1.135	10-Nov-2007	yamt	for PR_NOTOUCH pool_item_header, use a bitmap rather than a freelist. it saves some space and allows more items per a page.
1.134	07-Nov-2007	ad	Merge from vmlocking: - pool_cache changes. - Debugger/procfs locking fixes. - Other minor changes.
1.133	11-Oct-2007	ad	branches: 1.133.2; 1.133.4; Remove LOCK_ASSERT(!simple_lock_held(&foo));
1.132	11-Oct-2007	ad	Merge from vmlocking: - G/C spinlockmgr() and simple_lock debugging. - Always include the kernel_lock functions, for LKMs. - Slightly improved subr_lockdebug code. - Keep sizeof(struct lock) the same if LOCKDEBUG.
1.131	18-Aug-2007	ad	branches: 1.131.2; 1.131.4; pool_drain: add a comment.
1.130	18-Aug-2007	ad	pool_do_cache_invalidate_grouplist: drop locks while calling the destructor. XXX Expensive - to be revisited.
1.129	12-Mar-2007	ad	branches: 1.129.8; 1.129.12; Pass an ipl argument to pool_init/POOL_INIT to be used when initializing the pool's lock.
1.128	04-Mar-2007	christos	branches: 1.128.2; Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
1.127	22-Feb-2007	thorpej	TRUE -> true, FALSE -> false
1.126	21-Feb-2007	thorpej	Replace the Mach-derived boolean_t type with the C99 bool type. A future commit will replace use of TRUE and FALSE with true and false.
1.125	09-Feb-2007	ad	branches: 1.125.2; Merge newlock2 to head.
1.124	01-Nov-2006	yamt	remove some __unused from function parameters.
1.123	12-Oct-2006	christos	- sprinkle __unused on function decls. - fix a couple of unused bugs - no more -Wno-unused for i386
1.122	03-Sep-2006	christos	branches: 1.122.2; 1.122.4; avoid empty else statement
1.121	20-Aug-2006	yamt	implement PR_NOALIGN. (allow unaligned pages) to be used by vmem quantum cache.
1.120	19-Aug-2006	yamt	pool_init: in the case of PR_NOTOUCH, don't bump item size to sizeof(struct pool_item).
1.119	21-Jul-2006	yamt	use ASSERT_SLEEPABLE where appropriate.
1.118	07-Jun-2006	kardel	merge FreeBSD timecounters from branch simonb-timecounters - struct timeval time is gone time.tv_sec -> time_second - struct timeval mono_time is gone mono_time.tv_sec -> time_uptime - access to time via {get,}{micro,nano,bin}time() get* versions are fast but less precise - support NTP nanokernel implementation (NTP API 4) - further reading: Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
1.117	25-May-2006	yamt	move wait points for kva from upper layers to vm_map. PR/33185 #1. XXX there is a concern about interaction with kva fragmentation. see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html
1.116	15-Apr-2006	simonb	branches: 1.116.2; Add a DEBUG check that panics if pool_init() is called more than once on the same pool. As discussed on tech-kern a few months ago.
1.115	15-Apr-2006	christos	Coverity CID 760: Protect against NULL deref.
1.114	02-Apr-2006	yamt	pool_grow: don't increase pr_minpages. (fix a mistake in 1.113)
1.113	17-Mar-2006	yamt	make duplicated code fragments into a function, pool_grow.
1.112	24-Feb-2006	bjh21	branches: 1.112.2; 1.112.4; 1.112.6; Medium-sized overhaul of POOL_SUBPAGE support so that: 1: I can understand it, and 2: It works. Notable externally-visible changes are that POOL_SUBPAGE now has to be a compile-time constant, and that trying to initialise a pool whose objects are larger than POOL_SUBPAGE automatically generates a pool that doesn't use subpages. NetBSD/acorn26 now boots multi-user again.
1.111	26-Jan-2006	christos	branches: 1.111.2; 1.111.4; PR/32631: Yves-Emmanuel JUTARD: Fix DIAGNOSTIC panic in the pool code. At the time pool_get() calls pool_catchup(), pp has been free'd but it is still in the "entered" state. The chain pool_catchup() -> pool_allocator_alloc() -> pool_reclaim() on pp fails because pp is still in the "entered" state. Call pr_leave() before calling calling pool_catchup() to avoid this. Thanks for the excellent analysis!
1.110	24-Dec-2005	perry	branches: 1.110.2; Remove leading __ from __(const\|inline\|signed\|volatile) -- it is obsolete.
1.109	20-Dec-2005	christos	Commit temporary fix against kva starvation from yamt: - pool_allocator_alloc: drain ourselves as well, so that pool_cache on us is drained as well. - pool_cache_put_paddr: destruct objects if underlying pool is starved. - pool_get: on kva starvation, wake up once a second and try again. Fixes: PR/32287: Processes hang in "mclpl" PR/32330: shark kernel hangs under memory load.
1.108	01-Dec-2005	yamt	add "show all pools" command for ddb.
1.107	02-Nov-2005	yamt	pool_printit: don't keep a lock when printing info. we can't clean it up if ddb pager is quitted.
1.106	16-Oct-2005	christos	Make the grouplist invalidate function take a grouplist instead of a group. Suggested by yamt.
1.105	16-Oct-2005	christos	This is why I hate gotos: My previous change had different semantics than the original code since if fullgroups was empty and partgroups wasn't, we would not clean up partgroups (pointed out by yamt). Well, this one has different semantics from the original, they are the correct ones I think..
1.104	16-Oct-2005	christos	avoid a goto.
1.103	15-Oct-2005	chs	in pool_do_cache_invalidate(), make sure to process both full and partial group lists even if the first one we look at is empty. fix ddb print routine.
1.102	02-Oct-2005	chs	optimize pool_caches similarly to how I optimized pools before: split the single list of pool cache groups into three lists: completely full, partially full, and completely empty. use LIST instead of TAILQ where appropriate.
1.101	18-Jun-2005	thorpej	branches: 1.101.2; Fix some locking issues: - Make the locking rules for pr_rmpage() sane, and don't modify fields protected by the pool lock without actually holding it. - Always defer freeing the pool page to the back-end allocator, to avoid invoking the pool_allocator with the pool locked (which would violate the pool_allocator -> pool locking order). - Fix pool_reclaim() to not violate the pool_cache -> pool locking order by using a trylock. Reviewed by Chuq Silvers.
1.100	01-Apr-2005	yamt	merge yamt-km branch. - don't use managed mappings/backing objects for wired memory allocations. save some resources like pv_entry. also fix (most of) PR/27030. - simplify kernel memory management API. - simplify pmap bootstrap of some ports. - some related cleanups.
1.99	01-Jan-2005	yamt	branches: 1.99.2; 1.99.4; 1.99.8; PR_NOTOUCH: - use uint8_t instead of uint16_t for freelist index. - set ph_off only if PR_NOTOUCH. - comment.
1.98	01-Jan-2005	yamt	in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to large chunks for kernel_map and kmem_map to ease kva fragmentation.
1.97	01-Jan-2005	yamt	introduce a new flag for pool_init, PR_NOTOUCH. if it's specified, don't use free items as storage for internal state. so that we can use pools for non memory backed objects. inspired from solaris's KMC_NOTOUCH.
1.96	20-Jun-2004	thorpej	Remove PR_IMMEDRELEASE, since setting the high water mark will achieve the same thing. Pointed out back in January by YAMAMOTO Takashi.
1.95	20-May-2004	atatat	Add a DIAGNOSTIC check to detect un-initialized pools.
1.94	25-Apr-2004	simonb	Initialise (most) pools from a link set instead of explicit calls to pool_init. Untouched pools are ones that either in arch-specific code, or aren't initialiased during initial system startup. Convert struct session, ucred and lockf to pools.
1.93	08-Mar-2004	dbj	branches: 1.93.2; add splvm() around a few pa_slock and psppool calls since they may be shared with pools that can be used in interrupt context.
1.92	22-Feb-2004	enami	Modify pool page header allocation strategy as follows: In addition to current one (i.e., don't wast so large part of the page), - if the header fitsin the page without wasting any items, put it there. - don't put the header in the page if it may consume rather big item. For example, on i386, header is now allocated in the page for the pools like fdescpl or sigapl, and allocated off the page for the pools like buf1k or buf2k.
1.91	16-Jan-2004	yamt	- fix locking order problem. (pa_slock -> pr_slock) - protect pr_phtree with pr_slock. - add some LOCK_ASSERTs.
1.90	09-Jan-2004	thorpej	Add a new pool initialization flag, PR_IMMEDRELEASE. This flag causes idle pool pages to be returned to the system immediately upon becoming de-fragmented. Also, in pool_do_put(), don't free back an idle page unless we are over our minimum page claim.
1.89	29-Dec-2003	yamt	pool_prime_page: initialize ph_time to mono_time instead of zero as it's a mono_time relative value.
1.88	13-Nov-2003	chs	two changes in improve scalability: (1) split the single list of pages allocated to a pool into three lists: completely full, partially full, and completely empty. there is no longer any need to traverse any list looking for a certain type of page. (2) replace the 8-element hash table for out-of-page page headers with a splay tree. these two changes (together with the recent enhancements to the wait code) give us linear scaling for a fork+exit microbenchmark.
1.87	09-Apr-2003	thorpej	branches: 1.87.2; Add the ability for pool caches to cache the physical address of objects. Clients of the pool_cache API must consistently use the "paddr" variants or not, otherwise behavior is undefined. Enable this on Alpha, ARM, MIPS, and x86. Other platforms must define POOL_VTOPHYS() in the appropriate manner in order to enable the feature. Part 1 of a series of simple patches contributed by Wasabi Systems to improve network performance.
1.86	16-Mar-2003	matt	Only define POOL_LOGSIZE/pool_size if POOL_DIAGNOSTIC is defined.
1.85	23-Feb-2003	pk	Use splvm() instead of splhigh() when accessing the internal page header pool.
1.84	18-Jan-2003	thorpej	Merge the nathanw_sa branch.
1.83	24-Nov-2002	scw	Quell uninitialised variable warnings.
1.82	09-Nov-2002	thorpej	Fix signed/unsigned comparison warnings.
1.81	08-Nov-2002	enami	Parse the modifier of ddb command as documented.
1.80	27-Sep-2002	provos	remove trailing \n in panic(). approved perry.
1.79	25-Aug-2002	thorpej	Fix signed/unsigned comparison warnings from GCC 3.3.
1.78	30-Jul-2002	thorpej	Bring down a fix from the "newlock" branch, slightly modified: * In pool_prime_page(), assert that the object being placed onto the free list meets the alignment constraints (that "ioff" within the object is aligned to "align"). * In pool_init(), round up the object size to the alignment value (or ALIGN(1), if no special alignment is needed) so that the above invariant holds true.
1.77	11-Jul-2002	matt	Add wchan to a panic (must have NOWAIT).
1.76	13-Mar-2002	simonb	branches: 1.76.4; 1.76.6; Move 'struct pool_cache_group' definition into <sys/pool.h>
1.75	13-Mar-2002	simonb	Remove two instances of an "error" variable that is only ever assigned to but not used.
1.74	09-Mar-2002	thorpej	branches: 1.74.2; Put back pool_prime(); the i386 mp pmap uses it.
1.73	09-Mar-2002	thorpej	Fix a couple of typos in simple_{,un}lock()'s.
1.72	09-Mar-2002	thorpej	Remove pool_prime(). Nothing uses it, and how it should be used it not really well-defined in the absense of PR_STATIC.
1.71	09-Mar-2002	thorpej	If, when a page becomes idle, the backend allocator is waiting for resources, release the page immediately, rather than letting it sit around cached. From art@openbsd.org.
1.70	09-Mar-2002	thorpej	Remove PR_MALLOCOK and PR_STATIC. The former wasn't actually used, and the latter, while there was some code tested the bit, was woefully incomplete and also unused by anything. Besides, PR_STATIC functionality could be better handled by backend allocators anyhow. From art@openbsd.org
1.69	08-Mar-2002	thorpej	Add a missing simple_unlock.
1.68	08-Mar-2002	thorpej	Add an optional "drain" client callback, which can be set by the new pool_set_drain_hook(). This hook is called in three cases: * When a pool has hit the hard limit, just before either erroring out or sleeping. * When a backend allocator fails to allocate memory. * Just before trying to reclaim pages in pool_reclaim(). This hook requests the client to try and free some items back to the pool. From art@openbsd.org.
1.67	08-Mar-2002	thorpej	Remove PR_FREEHEADER; nothing uses it anymore. From art@openbsd.org.
1.66	08-Mar-2002	thorpej	Pool deals fairly well with physical memory shortage, but it doesn't deal with shortages of the VM maps where the backing pages are mapped (usually kmem_map). Try to deal with this: * Group all information about the backend allocator for a pool in a separate structure. The pool references this structure, rather than the individual fields. * Change the pool_init() API accordingly, and adjust all callers. * Link all pools using the same backend allocator on a list. * The backend allocator is responsible for waiting for physical memory to become available, but will still fail if it cannot callocate KVA space for the pages. If this happens, carefully drain all pools using the same backend allocator, so that some KVA space can be freed. * Change pool_reclaim() to indicate if it actually succeeded in freeing some pages, and use that information to make draining easier and more efficient. * Get rid of PR_URGENT. There was only one use of it, and it could be dealt with by the caller. From art@openbsd.org.
1.65	20-Nov-2001	enami	Call pr_log(PRLOG_GET) when POOL_DIAGNOSTIC is defined instead of DIAGNOSTIC for consistency.
1.64	12-Nov-2001	lukem	add RCSIDs
1.63	21-Oct-2001	chs	branches: 1.63.2; in pool_drain(), call pool_reclaim() while we still have interrupts blocked since the pool in question might be one used in interrupt context.
1.62	07-Oct-2001	bjh21	Add support for allocating pool memory in units smaller than a whole page. This is activated by defining POOL_SUBPAGE to the size of the new allocation unit, and makes pools much more efficient on machines with obscenely large pages. It might even make four-megabyte arm26 systems usable.
1.61	26-Sep-2001	chs	jump through hoops to avoid calling uvm_km_free_poolpage() while holding spinlocks, since that function can sleep. (note that there's still one instance remaining to be fixed.) use TAILQ_FOREACH where appropriate.
1.60	01-Jul-2001	thorpej	branches: 1.60.2; 1.60.4; Protect the `pool cache group' pool with splvm(), so that pool caches can be used by code that runs in interrupt context.
1.59	05-Jun-2001	thorpej	Do the reentrancy checking if POOL_DIAGNOSTIC, not DIAGNOSTIC. Prevents ABI change for diagnostic vs. non-diagnostic kernels.
1.58	05-Jun-2001	thorpej	Assert that no locks are held if we're called with PR_WAITOK. From Bill Sommerfeld.
1.57	13-May-2001	sommerfeld	Make this build again ifdef DIAGNOSTIC (oops)
1.56	13-May-2001	sommerfeld	Remove pool reentrancy testing overhead unless DIAGNOSTIC is defined. Previously, we passed __FILE__ and __LINE__ on all pool_get/pool_set calls. This change results in a measured 1.2% performance improvement in ping-flood packets-per-second as reported by ping(8).
1.55	10-May-2001	thorpej	Rearrange the code that adds pages of objects to the pool; require that the caller allocate the pool_item_header when it allocates the pool page, so we can avoid a locking pitfall (sleeping with a simple lock held). Also revive pool_prime(), as there are some letigimate uses of it, but in doing so, eliminate some of the bogosities of the old version (i.e. don't do an implicit "setlowat", just prime the pool, and incr the minpages for each additional page we add, and compute the number of pages to prime in a way that callers would expect).
1.54	10-May-2001	thorpej	Use POOL_NEEDS_CATCHUP() in one more place.
1.53	10-May-2001	thorpej	Encapsulate the test for a pool needing a pool_catchup() in a macro.
1.52	09-May-2001	thorpej	Remove pool_create() and pool_prime(). Nothing except pool_create() used pool_prime(), and no one uses pool_create() anymore. This makes it easier to fix a locking pitfall.
1.51	04-May-2001	thorpej	Add pool_cache_destruct_object(), used to force destruction of an object and release back into the pool.
1.50	29-Jan-2001	enami	branches: 1.50.2; Don't use PR_URGENT to allocate page header. We don't want to just panic on memory shortage. Instead, use the same wait/nowait condition with the item requested, and just cleanup and return failure if we can't allocate page header while we aren't allowed to wait.
1.49	14-Jan-2001	thorpej	Change some low-hanging splimp() calls to splvm().
1.48	11-Dec-2000	thorpej	Add some basic statistics to pool_cache.
1.47	10-Dec-2000	thorpej	Don't hold a pool cache lock across any call to pool_get() or pool_put(). This allows us to change a try-lock into a normal lock in the reclaim case.
1.46	07-Dec-2000	thorpej	...and when freeing cache groups, clear `freeto' if that's the one we're freeing.
1.45	07-Dec-2000	thorpej	When we invalidate a pool cache, make sure to clear `allocfrom' if we empty out that cache group.
1.44	07-Dec-2000	thorpej	Add a /c modifier to "show pool" to display pool caches.
1.43	07-Dec-2000	thorpej	This is a first-cut implementation of support for caching of constructed objects in the pool allocator, similar to caching of constructed objects in the Solaris SLAB allocator. This implementation is a separate API (pool_cache_*()) layered on top of pools to keep the caching complexity out of the way of pools that won't benefit from it. While we're here, allow pool items to be as large as the pool page size.
1.42	06-Dec-2000	thorpej	ANSI'ify.
1.41	19-Nov-2000	sommerfeld	In pool_setlowat(), only call pool_catchup() if the pool is under the low water mark. (Avoids annoying warning when you setlowat a static pool).
1.40	12-Aug-2000	sommerfeld	Use ltsleep instead of simple_unlock/tsleep/simple_lock
1.39	27-Jun-2000	mrg	remove include of <vm/vm.h>
1.38	26-Jun-2000	mrg	remove/move more mach vm header files: <vm/pglist.h> -> <uvm/uvm_pglist.h> <vm/vm_inherit.h> -> <uvm/uvm_inherit.h> <vm/vm_kern.h> -> into <uvm/uvm_extern.h> <vm/vm_object.h> -> nothing <vm/vm_pager.h> -> into <uvm/uvm_pager.h> also includes a bunch of <vm/vm_page.h> include removals (due to redudancy with <vm/vm.h>), and a scattering of other similar headers.
1.37	10-Jun-2000	sommerfeld	Fix assorted bugs around shutdown/reboot/panic time. - add a new global variable, doing_shutdown, which is nonzero if vfs_shutdown() or panic() have been called. - in panic, set RB_NOSYNC if doing_shutdown is already set on entry so we don't reenter vfs_shutdown if we panic'ed there. - in vfs_shutdown, don't use proc0's process for sys_sync unless curproc is NULL. - in lockmgr, attribute successful locks to proc0 if doing_shutdown && curproc==NULL, and panic if we can't get the lock right away; avoids the spurious lockmgr DIAGNOSTIC panic from the ddb reboot command. - in subr_pool, deal with curproc==NULL in the doing_shutdown case. - in mfs_strategy, bitbucket writes if doing_shutdown, so we don't wedge waiting for the mfs process. - in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the panicstr case. Appears to fix: kern/9239, kern/10187, kern/9367. May also fix kern/10122.
1.36	31-May-2000	pk	Allow a pool's pagesz to larger than the VM page size. Enforce the required page alignment restriction in pool_prime_page().
1.35	31-May-2000	pk	Assert that the pool item size does not exceed the page size.
1.34	08-May-2000	thorpej	branches: 1.34.2; __predict_false() the DIAGNOSTIC and other error condition checks.
1.33	13-Apr-2000	chs	always define PI_MAGIC so this compiles in all cases.
1.32	10-Apr-2000	chs	in pool_put(), fill the entire object with PI_MAGIC instead of just the first element.
1.31	14-Feb-2000	thorpej	Use ratecheck().
1.30	29-Aug-1999	thorpej	branches: 1.30.2; In _pool_put(), panic if we're put'ing with nout == 0. This will help us detect a little earlier if we've dup-put'd. Otherwise, underflow occurs, and subsequent allocations simply hang or fail (it thinks the hardlimit has been reached).
1.29	05-Aug-1999	sommerfeld	Create new pool flag PR_LIMITFAIL, indicating that even PR_WAIT allocations should fail if the pool is at its hard limit. Document flag in pool(9). Use it in mbuf.h for the first allocate call for M_GET, M_GETHDR, and MCLGET, so that m_reclaim gets called even for blocking allocations.
1.28	27-Jul-1999	thorpej	In _pool_put(), call simple_lock_freecheck() if we're LOCKDEBUG before we put the item on the free list.
1.27	06-Jun-1999	pk	Guard our global resource `phpool' against all interrupts.
1.26	10-May-1999	thorpej	Make sure page allocations are counted everywhere that they need to be.
1.25	10-May-1999	thorpej	Improve the pool allocator's diagnostic helpers, adding the ability to log on a per-pool basis, reentrancy checking, and dumping various pool information from DDB.
1.24	29-Apr-1999	scottr	Pull in opt_poollog.h for POOL_LOGSIZE.
1.23	06-Apr-1999	thorpej	More locking protocol fixes. Protect pool_head with a spin lock (statically initialized). This lock also protects the "next drain candidate" pointer. XXX There is still one locking protocol problem, which should not be a problem in practice, but is still marked as an issue in the code anyhow.
1.22	04-Apr-1999	chs	Undo the part of the last revision about pr_rmpage() referencing a data structure after it was freed. This wasn't actually a problem, and the change caused the wrong pool_item_header to be freed in the non-PR_PHINPAGE case.
1.21	31-Mar-1999	thorpej	branches: 1.21.2; Yet more fixes to the pool allocator: - Protect userspace from unnecessary header inclusions (as noted on current-users). - Some const poisioning. - GREATLY simplify the locking protocol, and fix potential deadlock scenarios. In particular, assume that the back-end page allocator provides its own locking mechanism (this is currently true for all such allocators in the NetBSD kernel). Doing so allows us to simply use one spin lock for serialized access to all r/w members of the pool descriptor. The spin lock is released before calling the back-end allocator, and re-acquired upon return from it. - Fix a problem in pr_rmpage() where a data structure was referenced after it was freed. - Minor tweak to page manaement. Migrate both idle and empty pages to the end of the page list. As soon as a page becomes un-empty (by a pool_put()), place it at the head of the page list, and set curpage to point to it. This reduces fragmentation as well as the time required to find a non-empty page as soon as curpage becomes empty again. - Use mono_time throughout, and protect access to it w/ splclock(). - In pool_reclaim(), if freeing an idle page would reduce the number of allocatable items to below the low water mark, don't.
1.20	31-Mar-1999	thorpej	Fix several bugs/deficiencies in the pool allocator: - Add support for hard limits, with optional rate-limited logging of a warning message when the pool limit is reached. (This will be used to fix a bug in mbuf cluster allocation on the MIPS and Alpha ports.) - Fix some locking protocol errors. This required splitting pr_flags into pr_flags (which is protected by the spin lock) and pr_roflags (which are `read only' flags, set when the pool is initialized, and never changed again; these do not need to be protected by a mutex). - Make the low water support actually mean something. When a low water mark is set, add free items to the pool until the low water mark is reached. When an item allocation causes the number of free items to drop below the low water mark, make the pool catch up to it. This can make the pool allocator more useful for several applications (e.g. pmap `pv entry' management) and more robust for others (for e.g. mbuf and mbuf cluster allocation, so that the pagedaemon can use NFS to clean pages on diskless systems without completely running dry on buffers to receive packets in during extreme memory shoratages). - Add a comment where we sleep waiting for more pages for the back-end page allocator. Specifically, instead of sleeping potentially forever, perhaps we should just wake up once a second to try allocating a page again. XXX Revisit this soon.
1.19	24-Mar-1999	mrg	completely remove Mach VM support. all that is left is the all the header files as UVM still uses (most of) these.
1.18	23-Mar-1999	thorpej	Fix the order of arguments to roundup().
1.17	27-Dec-1998	thorpej	Make this compile with POOL_DIAGNOSTIC, and add a POOL_LOGSIZE option. Defopt these.
1.16	16-Dec-1998	briggs	Prototype pool_print() and pool_chk() if DEBUG. Initialize pool hash table with PR_HASHTABSIZE (i.e., 8) LIST_INIT()s instead of one memset(). Only check for page != ph->ph_page if PR_PHINPAGE is set (in pool_chk()). Print pool base pointer when reporting page inconsistency in pool_chk().
1.15	29-Sep-1998	pk	In addition to the spinlock, use the lockmgr() to serialize access to the back-end page allocator. This allows the back-end to sleep since we now relinquish the spin lock after acquiring the long-term lock.
1.14	22-Sep-1998	thorpej	Make sure the size is large enough to hold a pool_item.
1.13	12-Sep-1998	christos	Make copyrights consistent; fix weird/trailing spaces add missing (c) etc.
1.12	28-Aug-1998	thorpej	Add an alternate pool page allocator that can be used if the pool is never accessed in interrupt context. In the UVM case, this uses the kernel_map, to reduce usage of the previous kmem_map resource.
1.11	28-Aug-1998	thorpej	Add a waitok boolean argument to the VM system's pool page allocator backend.
1.10	13-Aug-1998	eeh	Merge paddr_t changes into the main branch.
1.9	04-Aug-1998	perry	Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one. bcopy(x, y, z) -> memcpy(y, x, z) ovbcopy(x, y, z) -> memmove(y, x, z) bcmp(x, y, z) -> memcmp(x, y, z) bzero(x, y) -> memset(x, 0, y)
1.8	02-Aug-1998	thorpej	Make sure we initialize pr_nidle.
1.7	02-Aug-1998	thorpej	Fix a braino in the idle page instrumentation.
1.6	01-Aug-1998	thorpej	Instrument "idle pages" (i.e. pages which have no items allocated from them, and could thus be freed back to the system).
1.5	31-Jul-1998	thorpej	Un-static pool_head; vmstat wants to find it.
1.4	24-Jul-1998	thorpej	branches: 1.4.2; A few small changes to how pool pages are allocated/freed: - If either an alloc or release function is provided, make sure both are provided, otherwise panic, as this is a fatal error. - If using the default allocator, default the pool pagesz to PAGE_SIZE, since that is the granularity of the default allocator's mechanism. - In the default allocator, use new functions: uvm_km_alloc_poolpage()/uvm_km_free_poolpage(), or kmem_alloc_poolpage()/kmem_free_poolpage() rather than doing it here. These functions may use pmap hooks to provide alternate methods of mapping pool pages.
1.3	23-Jul-1998	pk	Re-vamped pool manager. * support for customized memory supplier * automatic page reclaim by VM system * time-based hysteresis * cache coloring (after Bonwick's "slabs")
1.2	19-Feb-1998	pk	Add option to use "static" storage provided by the caller. From Matthias Drochner.
1.1	15-Dec-1997	pk	Memory pool resource utility.
1.4.2.2	08-Aug-1998	eeh	Revert cdevsw mmap routines to return int.
1.4.2.1	30-Jul-1998	eeh	Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
1.21.2.4	25-Jun-1999	perry	somehow, the last commit was botched. fix it
1.21.2.3	24-Jun-1999	perry	pullup 1.26->1.27 (pk): deal with missing "raise interrupt level" code
1.21.2.2	07-Apr-1999	thorpej	branches: 1.21.2.2.2; 1.21.2.2.4; Pull up 1.22 -> 1.23.
1.21.2.1	04-Apr-1999	chs	pull up rev 1.22. approved by perry.
1.21.2.2.4.1	30-Nov-1999	itojun	bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch just for reference purposes. This commit includes 1.4 -> 1.4.1 sync for kame branch. The branch does not compile at all (due to the lack of ALTQ and some other source code). Please do not try to modify the branch, this is just for referenre purposes. synchronization to latest KAME will take place on HEAD branch soon.
1.21.2.2.2.3	02-Aug-1999	thorpej	Update from trunk.
1.21.2.2.2.2	04-Jul-1999	chs	in pool_put(), fill the item with a distinctive pattern ifdef DEBUG.
1.21.2.2.2.1	21-Jun-1999	thorpej	Sync w/ -current.
1.30.2.6	11-Feb-2001	bouyer	Sync with HEAD.
1.30.2.5	18-Jan-2001	bouyer	Sync with head (for UBC+NFS fixes, mostly).
1.30.2.4	13-Dec-2000	bouyer	Sync with HEAD (for UBC fixes).
1.30.2.3	08-Dec-2000	bouyer	Sync with HEAD.
1.30.2.2	22-Nov-2000	bouyer	Sync with HEAD.
1.30.2.1	20-Nov-2000	bouyer	Update thorpej_scsipi to -current as of a month ago
1.34.2.1	22-Jun-2000	minoura	Sync w/ netbsd-1-5-base.
1.50.2.13	11-Dec-2002	thorpej	Sync with HEAD.
1.50.2.12	11-Nov-2002	nathanw	Catch up to -current
1.50.2.11	18-Oct-2002	nathanw	Catch up to -current.
1.50.2.10	27-Aug-2002	nathanw	Catch up to -current.
1.50.2.9	01-Aug-2002	nathanw	Catch up to -current.
1.50.2.8	24-Jun-2002	nathanw	Curproc->curlwp renaming. Change uses of "curproc->l_proc" back to "curproc", which is more like the original use. Bare uses of "curproc" are now "curlwp". "curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL) so that it is always safe to reference curproc (dereferencing curproc is another story, but that's always been true).
1.50.2.7	01-Apr-2002	nathanw	Catch up to -current. (CVS: It's not just a program. It's an adventure!)
1.50.2.6	08-Jan-2002	nathanw	Catch up to -current.
1.50.2.5	14-Nov-2001	nathanw	Catch up to -current.
1.50.2.4	22-Oct-2001	nathanw	Catch up to -current.
1.50.2.3	26-Sep-2001	nathanw	Catch up to -current. Again.
1.50.2.2	24-Aug-2001	nathanw	Catch up with -current.
1.50.2.1	21-Jun-2001	nathanw	Catch up to -current.
1.60.4.2	11-Oct-2001	fvdl	Catch up with -current. Fix some bogons in the sparc64 kbd/ms attach code. cd18xx conversion provided by mrg.
1.60.4.1	01-Oct-2001	fvdl	Catch up with -current.
1.60.2.4	10-Oct-2002	jdolecek	sync kqueue with -current; this includes merge of gehenna-devsw branch, merge of i386 MP branch, and part of autoconf rototil work
1.60.2.3	06-Sep-2002	jdolecek	sync kqueue branch with HEAD
1.60.2.2	16-Mar-2002	jdolecek	Catch up with -current.
1.60.2.1	10-Jan-2002	thorpej	Sync kqueue branch with -current.
1.63.2.1	12-Nov-2001	thorpej	Sync the thorpej-mips-cache branch with -current.
1.74.2.2	12-Mar-2002	thorpej	Do the previous differently; instead, pad the size the the structure to the specified alignment, the way we pad to the system's natural alignment.
1.74.2.1	12-Mar-2002	thorpej	Sprinkle some assertions around that ensures that the returned object is aligned as requested. Bug fix: in pool_prime_page(), make sure to account for alignment when advancing the pointer through the page.
1.76.6.1	11-Nov-2002	he	Pull up revision 1.78 (requested by thorpej in ticket #582): Bring down a fix from the "newlock" branch, slightly modified: o In pool_prime_page(), assert that the object being placed onto the free list meets the alignment constraints (that "ioff" within the object is aligned to "align"). o In pool_init(), round up the object size to the alignment value (or ALIGN(1), if no special alignment is needed) so that the above invariant holds true.
1.76.4.2	29-Aug-2002	gehenna	catch up with -current.
1.76.4.1	15-Jul-2002	gehenna	catch up with -current.
1.87.2.7	11-Dec-2005	christos	Sync with head.
1.87.2.6	10-Nov-2005	skrll	Sync with HEAD. Here we go again...
1.87.2.5	01-Apr-2005	skrll	Sync with HEAD.
1.87.2.4	17-Jan-2005	skrll	Sync with HEAD.
1.87.2.3	21-Sep-2004	skrll	Fix the sync with head I botched.
1.87.2.2	18-Sep-2004	skrll	Sync with HEAD.
1.87.2.1	03-Aug-2004	skrll	Sync with HEAD
1.93.2.1	22-Jun-2004	tron	Pull up revision 1.96 (requested by thorpej in ticket #522): Remove PR_IMMEDRELEASE, since setting the high water mark will achieve the same thing. Pointed out back in January by YAMAMOTO Takashi.
1.99.8.2	10-Mar-2006	tron	Pull up following revision(s) (requested by bjh21 in ticket #1192): sys/sys/pool.h: revision 1.48 sys/kern/subr_pool.c: revision 1.112 Medium-sized overhaul of POOL_SUBPAGE support so that: 1: I can understand it, and 2: It works. Notable externally-visible changes are that POOL_SUBPAGE now has to be a compile-time constant, and that trying to initialise a pool whose objects are larger than POOL_SUBPAGE automatically generates a pool that doesn't use subpages. NetBSD/acorn26 now boots multi-user again.
1.99.8.1	18-Jun-2005	tron	branches: 1.99.8.1.2; Pull up revision 1.101 (requested by thorpej in ticket #474): Fix some locking issues: - Make the locking rules for pr_rmpage() sane, and don't modify fields protected by the pool lock without actually holding it. - Always defer freeing the pool page to the back-end allocator, to avoid invoking the pool_allocator with the pool locked (which would violate the pool_allocator -> pool locking order). - Fix pool_reclaim() to not violate the pool_cache -> pool locking order by using a trylock. Reviewed by Chuq Silvers.
1.99.8.1.2.1	10-Mar-2006	tron	Pull up following revision(s) (requested by bjh21 in ticket #1192): sys/sys/pool.h: revision 1.48 sys/kern/subr_pool.c: revision 1.112 Medium-sized overhaul of POOL_SUBPAGE support so that: 1: I can understand it, and 2: It works. Notable externally-visible changes are that POOL_SUBPAGE now has to be a compile-time constant, and that trying to initialise a pool whose objects are larger than POOL_SUBPAGE automatically generates a pool that doesn't use subpages. NetBSD/acorn26 now boots multi-user again.
1.99.4.1	25-Jan-2005	yamt	convert to new apis.
1.99.2.1	29-Apr-2005	kent	sync with -current
1.101.2.13	24-Mar-2008	yamt	sync with head.
1.101.2.12	17-Mar-2008	yamt	sync with head.
1.101.2.11	27-Feb-2008	yamt	sync with head.
1.101.2.10	11-Feb-2008	yamt	sync with head.
1.101.2.9	04-Feb-2008	yamt	sync with head.
1.101.2.8	21-Jan-2008	yamt	sync with head
1.101.2.7	07-Dec-2007	yamt	sync with head
1.101.2.6	15-Nov-2007	yamt	sync with head.
1.101.2.5	27-Oct-2007	yamt	sync with head.
1.101.2.4	03-Sep-2007	yamt	sync with head.
1.101.2.3	26-Feb-2007	yamt	sync with head.
1.101.2.2	30-Dec-2006	yamt	sync with head.
1.101.2.1	21-Jun-2006	yamt	sync with head.
1.110.2.2	01-Mar-2006	yamt	sync with head.
1.110.2.1	01-Feb-2006	yamt	sync with head.
1.111.4.3	01-Jun-2006	kardel	Sync with head.
1.111.4.2	22-Apr-2006	simonb	Sync with head.
1.111.4.1	04-Feb-2006	simonb	Adapt for timecounters: mostly use get*time() and use "time_second" instead of "time.tv_sec".
1.111.2.1	09-Sep-2006	rpaulo	sync with head
1.112.6.2	24-May-2006	tron	Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
1.112.6.1	28-Mar-2006	tron	Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
1.112.4.1	19-Apr-2006	elad	sync with head.
1.112.2.6	03-Sep-2006	yamt	sync with head.
1.112.2.5	11-Aug-2006	yamt	sync with head
1.112.2.4	26-Jun-2006	yamt	sync with head.
1.112.2.3	24-May-2006	yamt	sync with head.
1.112.2.2	11-Apr-2006	yamt	sync with head
1.112.2.1	01-Apr-2006	yamt	sync with head.
1.116.2.1	19-Jun-2006	chap	Sync with head.
1.122.4.2	10-Dec-2006	yamt	sync with head.
1.122.4.1	22-Oct-2006	yamt	sync with head
1.122.2.3	19-Jan-2007	ad	Add some DEBUG code to check that items being freed were previously allocated from the same source. Needs to be enabled via DDB.
1.122.2.2	20-Oct-2006	ad	Remove sched_lock assertion.
1.122.2.1	11-Sep-2006	ad	From the newlock branch: add some KASSERT() verifying correct alignment.
1.125.2.3	24-Mar-2007	yamt	sync with head.
1.125.2.2	12-Mar-2007	rmind	Sync with HEAD.
1.125.2.1	27-Feb-2007	yamt	- sync with head. - move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
1.128.2.13	01-Nov-2007	ad	pool_reclaim: acquire kernel_lock if the pool is at IPL_SOFTCLOCK, SOFTNET or SOFTSERIAL, as mutexes at these levels must still be spinlocks. It's not yet safe for e.g. ip_intr() to block as this upsets code calling up from the socket layer. It can find pcbs sitting half baked. pool_cache_xcall: go to splvm to prevent kernel_lock from being taken, for the reason listed above. Pointed out by yamt@.
1.128.2.12	29-Oct-2007	ad	pool_drain_start: tweak assertions/comments.
1.128.2.11	26-Oct-2007	ad	- Use a cross call to drain the per-CPU component of pool caches. - When draining, skip over pools that are completly inactive.
1.128.2.10	25-Sep-2007	ad	If no constructor/destructor are provided for a pool_cache, use nullop. Remove the tests for pc_ctor/pc_dtor != NULL.
1.128.2.9	10-Sep-2007	ad	Fix a deadlock.
1.128.2.8	09-Sep-2007	ad	- Re-enable pool_cache, since it works on i386 again after today's pmap change. pool_cache_invalidate() no longer invalidates objects stored in the per-CPU caches. This needs some thought. - Remove pcg_get, pcg_put since they are only called from one place each. - Remove cc_busy assertions, since they don't work correctly. Pointed out by yamt@. - Add some more-assertions and simplify.
1.128.2.7	01-Sep-2007	ad	- Add a CPU layer to pool caches. In combination with vmem/kmem this provides CPU-local slab/object and general purpose allocators. The strategy used is as described in Jeff Bonwick's USENIX paper, except in at least one place where the described allocation strategy doesn't make sense. For exclusive access to the CPU layer the IPL is raised or kernel preemption disabled. Where the interrupt priority levels are software emulated this is much cheaper than taking a lock, and I think that writing to a local %pil register is likely to have a similar penalty to taking a lock. No tuning of the group sizes is currently done - all groups have 15 items each, but this should be fairly easy to implement. Also, the reclamation mechanism should probably use a cross-call to drain the CPU-level caches on remote CPUs. Currently this causes kernel memory corruption on i386, yet works without a problem on amd64. The cache layer is disabled for the time being until I can find the bug. - Change the pool_cache API so that the caches are themselves dynamically allocated, and that each cache is tied to a single pool only. Add some stubs to change pool_cache parameters that call directly through to the pool layer (e.g. pool_cache_sethiwat). The idea here is that pool_cache should become the default object allocator (and so LKM friendly), and that the pool allocator should be for kernel-internal use only. This will be posted to tech-kern@ for review.
1.128.2.6	20-Aug-2007	ad	Sync with HEAD.
1.128.2.5	29-Jul-2007	ad	Trap free() of areas that contain undestroyed locks. Not a major problem but it helps to catch bugs.
1.128.2.4	22-Mar-2007	ad	- Remove debugging crud. - wakeup -> cv_broadcast.
1.128.2.3	21-Mar-2007	ad	GC the simplelock/spinlock debugging stuff.
1.128.2.2	13-Mar-2007	ad	Pull in the initial set of changes for the vmlocking branch.
1.128.2.1	13-Mar-2007	ad	Sync with head.
1.129.12.6	09-Dec-2007	jmcneill	Sync with HEAD.
1.129.12.5	21-Nov-2007	joerg	Sync with HEAD.
1.129.12.4	14-Nov-2007	joerg	Sync with HEAD.
1.129.12.3	11-Nov-2007	joerg	Sync with HEAD.
1.129.12.2	26-Oct-2007	joerg	Sync with HEAD. Follow the merge of pmap.c on i386 and amd64 and move pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup code to restore CR4 before jumping back into kernel space as the large page option might cover that.
1.129.12.1	03-Sep-2007	jmcneill	Sync with HEAD.
1.129.8.1	03-Sep-2007	skrll	Sync with HEAD.
1.131.4.1	14-Oct-2007	yamt	sync with head.
1.131.2.4	23-Mar-2008	matt	sync with HEAD
1.131.2.3	09-Jan-2008	matt	sync with HEAD
1.131.2.2	08-Nov-2007	matt	sync with -HEAD
1.131.2.1	06-Nov-2007	matt	sync with HEAD
1.133.4.4	18-Feb-2008	mjf	Sync with HEAD.
1.133.4.3	27-Dec-2007	mjf	Sync with HEAD.
1.133.4.2	08-Dec-2007	mjf	Sync with HEAD.
1.133.4.1	19-Nov-2007	mjf	Sync with HEAD.
1.133.2.2	18-Nov-2007	bouyer	Sync with HEAD
1.133.2.1	13-Nov-2007	bouyer	Sync with HEAD
1.137.2.7	31-Dec-2007	ad	Make pool_cache_disable work again.
1.137.2.6	28-Dec-2007	ad	pool_cache_put_slow: fill cc_previous if empty. Pointed out by yamt@.
1.137.2.5	26-Dec-2007	ad	Sync with head.
1.137.2.4	26-Dec-2007	ad	Need sys/atomic.h here.
1.137.2.3	15-Dec-2007	ad	Sort list of pools/caches to make easier them easier to find.
1.137.2.2	12-Dec-2007	ad	Add a global 'pool_cache_disable', to be set from the debugger. Helpful when tracking down leaks.
1.137.2.1	08-Dec-2007	ad	Sync with head.
1.138.4.3	08-Jan-2008	bouyer	Sync with HEAD
1.138.4.2	02-Jan-2008	bouyer	Sync with HEAD
1.138.4.1	13-Dec-2007	bouyer	Sync with HEAD
1.138.2.3	13-Dec-2007	yamt	sync with head.
1.138.2.2	10-Dec-2007	yamt	- separate kernel va allocation (kernel_va_arena) from in-kernel fault handling (kernel_map). - add vmem bootstrap code. vmem doesn't rely on malloc anymore. - make kmem_alloc interrupt-safe. - kill kmem_map. make malloc a wrapper of kmem_alloc.
1.138.2.1	10-Dec-2007	yamt	add pool_cache_bootstrap_destroy. will be used by vmem.
1.151.6.4	17-Jan-2009	mjf	Sync with HEAD.
1.151.6.3	28-Sep-2008	mjf	Sync with HEAD.
1.151.6.2	02-Jun-2008	mjf	Sync with HEAD.
1.151.6.1	03-Apr-2008	mjf	Sync with HEAD.
1.151.2.1	24-Mar-2008	keiichi	sync with head.
1.156.2.2	04-Jun-2008	yamt	sync with head
1.156.2.1	18-May-2008	yamt	sync with head.
1.158.2.5	11-Aug-2010	yamt	sync with head.
1.158.2.4	11-Mar-2010	yamt	sync with head
1.158.2.3	16-Sep-2009	yamt	sync with head
1.158.2.2	04-May-2009	yamt	sync with head.
1.158.2.1	16-May-2008	yamt	sync with head.
1.160.2.2	18-Sep-2008	wrstuden	Sync with wrstuden-revivesa-base-2.
1.160.2.1	23-Jun-2008	wrstuden	Sync w/ -current. 34 merge conflicts to follow.
1.161.2.1	18-Jul-2008	simonb	Sync with head.
1.165.2.3	13-Dec-2008	haad	Update haad-dm branch to haad-dm-base2.
1.165.2.2	19-Oct-2008	haad	Sync with HEAD.
1.165.2.1	07-Jul-2008	haad	file subr_pool.c was added on branch haad-dm on 2008-10-19 22:17:28 +0000
1.170.4.1	17-Nov-2008	snj	Pull up following revision(s) (requested by ad in ticket #72): sys/kern/subr_pool.c: revision 1.171 Avoid recursive mutex_enter() when the system is low on KVA. Should fix crash reported by riz on current-users.
1.170.2.2	28-Apr-2009	skrll	Sync with HEAD.
1.170.2.1	19-Jan-2009	skrll	Sync with HEAD.
1.171.4.1	13-May-2009	jym	Sync with HEAD. Commit is split, to avoid a "too many arguments" protocol error.
1.182.4.4	21-Apr-2011	rmind	sync with head
1.182.4.3	05-Mar-2011	rmind	sync with head
1.182.4.2	03-Jul-2010	rmind	sync with head
1.182.4.1	30-May-2010	rmind	sync with head
1.182.2.2	17-Aug-2010	uebayasi	Sync with HEAD.
1.182.2.1	30-Apr-2010	uebayasi	Sync with HEAD.
1.186.2.1	06-Jun-2011	jruoho	Sync with HEAD.
1.190.6.2	02-Jun-2012	mrg	sync to latest -current.
1.190.6.1	18-Feb-2012	mrg	merge to -current.
1.190.2.4	22-May-2014	yamt	sync with head. for a reference, the tree before this commit was tagged as yamt-pagecache-tag8. this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
1.190.2.3	30-Oct-2012	yamt	sync with head
1.190.2.2	23-May-2012	yamt	sync with head.
1.190.2.1	17-Apr-2012	yamt	sync with head
1.194.2.2	21-May-2014	bouyer	Pull up following revision(s) (requested by abs in ticket #1054): sys/kern/subr_pool.c: revision 1.202 Ensure pool_head is non static - for "vmstat -i"
1.194.2.1	02-Jul-2012	jdc	Pull up revisions: src/sys/kern/subr_pool.c revision 1.196 src/share/man/man9/pool_cache.9 patch (requested by jym in ticket #366). As pool reclaiming is unlikely to happen at interrupt or softint context, re-enable the portion of code that allows invalidation of CPU-bound pool caches. Two reasons: - CPU cached objects being invalidated, the probability of fetching an obsolete object from the pool_cache(9) is greatly reduced. This speeds up pool_cache_get() quite a bit as it does not have to keep destroying objects until it finds an updated one when an invalidation is in progress. - for situations where we have to ensure that no obsolete object remains after a state transition (canonical example: pmap mappings between Xen VM restoration), invalidating all pool_cache(9) is the safest way to go. As it uses xcall(9) to broadcast the execution of pool_cache_transfer(), pool_cache_invalidate() cannot be called from interrupt or softint context (scheduling a xcall(9) can put a LWP to sleep). pool_cache_xcall() => pool_cache_transfer() to reflect its use. Invalidation being a costly process (1000s objects may be destroyed), all places where pool_cache_invalidate() may be called from interrupt/softint context will now get caught by the proper KASSERT(), and fixed. Ping me when you see one. Tested under i386 and amd64 by running ATF suite within 64MiB HVM domains (tried triggering pgdaemon a few times). No objection on tech-kern@. XXX a similar fix has to be pulled up to NetBSD-6, but with a more conservative approach. See http://mail-index.netbsd.org/tech-kern/2012/05/29/msg013245.html
1.198.2.4	03-Dec-2017	jdolecek	update from HEAD
1.198.2.3	20-Aug-2014	tls	Rebase to HEAD as of a few days ago.
1.198.2.2	23-Jun-2013	tls	resync from head
1.198.2.1	25-Feb-2013	tls	resync with head
1.200.6.1	18-May-2014	rmind	sync with head
1.201.2.1	10-Aug-2014	tls	Rebase.
1.203.4.3	28-Aug-2017	skrll	Sync with HEAD
1.203.4.2	19-Mar-2016	skrll	Sync with HEAD
1.203.4.1	22-Sep-2015	skrll	Sync with HEAD
1.203.2.1	06-Mar-2016	martin	Pull up following revision(s) (requested by knakahara in ticket #1103): sys/kern/subr_pool.c: revision 1.206 fix: "vmstat -C" CpuLayer showed only the last cpu values.
1.206.4.1	21-Apr-2017	bouyer	Sync with HEAD
1.206.2.1	20-Mar-2017	pgoyette	Sync with HEAD
1.207.6.1	27-Feb-2018	martin	Pull up following revision(s) (requested by mrg in ticket #593): sys/dev/marvell/mvxpsec.c: revision 1.2 sys/arch/m68k/m68k/pmap_motorola.c: revision 1.70 sys/opencrypto/crypto.c: revision 1.102 sys/arch/sparc64/sparc64/pmap.c: revision 1.308 sys/ufs/chfs/chfs_malloc.c: revision 1.5 sys/arch/powerpc/oea/pmap.c: revision 1.95 sys/sys/pool.h: revision 1.80,1.82 sys/kern/subr_pool.c: revision 1.209-1.216,1.219-1.220 sys/arch/alpha/alpha/pmap.c: revision 1.262 sys/kern/uipc_mbuf.c: revision 1.173 sys/uvm/uvm_fault.c: revision 1.202 sys/sys/mbuf.h: revision 1.172 sys/kern/subr_extent.c: revision 1.86 sys/arch/x86/x86/pmap.c: revision 1.266 (via patch) sys/dev/dtv/dtv_scatter.c: revision 1.4 Allow only one pending call to a pool's backing allocator at a time. Candidate fix for problems with hanging after kva fragmentation related to PR kern/45718. Proposed on tech-kern: https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html Tested by bouyer@ on i386. This makes one small change to the semantics of pool_prime and pool_setlowat: they may fail with EWOULDBLOCK instead of ENOMEM, if there is a pending call to the backing allocator in another thread but we are not actually out of memory. That is unlikely because nearly always these are used during initialization, when the pool is not in use. Define the new flag too for previous commit. pool_grow can now fail even when sleeping is ok. Catch this case in pool_get and retry. Assert that pool_get failure happens only with PR_NOWAIT. This would have caught the mistake I made last week leading to null pointer dereferences all over the place, a mistake which I evidently poorly scheduled alongside maxv's change to the panic message on x86 for null pointer dereferences. Since pr_lock is now used to wait for two things now (PR_GROWING and PR_WANTED) we need to loop for the condition we wanted. make the KASSERTMSG/panic strings consistent as '%s: [%s], __func__, wchan' Handle the ERESTART case from pool_grow() don't pass 0 to the pool flags Guess pool_cache_get(pc, 0) means PR_WAITOK here. Earlier on in the same context we use kmem_alloc(sz, KM_SLEEP). use PR_WAITOK everywhere. use PR_NOWAIT. Don't use 0 for PR_NOWAIT use PR_NOWAIT instead of 0 panic ex nihilo -- PR_NOWAITing for zerot Add assertions that either PR_WAITOK or PR_NOWAIT are set. - fix an assert; we can reach there if we are nowait or limitfail. - when priming the pool and failing with ERESTART, don't decrement the number of pages; this avoids the issue of returning an ERESTART when we get to 0, and is more correct. - simplify the pool_grow code, and don't wakeup things if we ENOMEM. In pmap_enter_ma(), only try to allocate pves if we might need them, and even if that fails, only fail the operation if we later discover that we really do need them. This implements the requirement that pmap_enter(PMAP_CANFAIL) must not fail when replacing an existing mapping with the first mapping of a new page, which is an unintended consequence of the changes from the rmind-uvmplock branch in 2011. The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write). If that fails and leaves the old pmap entry in place, then UVM won't hold the right locks when it eventually retries. This entanglement of the UVM and pmap locking was done in rmind-uvmplock in order to improve performance, but it also means that the UVM state and pmap state need to be kept in sync more than they did before. It would be possible to handle this in the UVM code instead of in the pmap code, but these pmap changes improve the handling of low memory situations in general, and handling this in UVM would be clunky, so this seemed like the better way to go. This somewhat indirectly fixes PR 52706, as well as the failing assertion about "uvm_page_locked_p(old_pg)". (but only on x86, various other platforms will need their own changes to handle this issue.) In uvm_fault_upper_enter(), if pmap_enter(PMAP_CANFAIL) fails, assert that the pmap did not leave around a now-stale pmap mapping for an old page. If such a pmap mapping still existed after we unlocked the vm_map, the UVM code would not know later that it would need to lock the lower layer object while calling the pmap to remove or replace that stale pmap mapping. See PR 52706 for further details. hopefully workaround the irregularly "fork fails in init" problem. if a pool is growing, and the grower is PR_NOWAIT, mark this. if another caller wants to grow the pool and is also PR_NOWAIT, busy-wait for the original caller, which should either succeed or hard-fail fairly quickly. implement the busy-wait by unlocking and relocking this pools mutex and returning ERESTART. other methods (such as having the caller do this) were significantly more code and this hack is fairly localised. ok chs@ riastradh@ Don't release the lock in the PR_NOWAIT allocation. Move flags setting after the acquiring the mutex. (from Tobias Nygren) apply the change from arch/x86/x86/pmap.c rev. 1.266 commitid vZRjvmxG7YTHLOfA: In pmap_enter_ma(), only try to allocate pves if we might need them, and even if that fails, only fail the operation if we later discover that we really do need them. If we are replacing an existing mapping, reuse the pv structure where possible. This implements the requirement that pmap_enter(PMAP_CANFAIL) must not fail when replacing an existing mapping with the first mapping of a new page, which is an unintended consequence of the changes from the rmind-uvmplock branch in 2011. The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write). If that fails and leaves the old pmap entry in place, then UVM won't hold the right locks when it eventually retries. This entanglement of the UVM and pmap locking was done in rmind-uvmplock in order to improve performance, but it also means that the UVM state and pmap state need to be kept in sync more than they did before. It would be possible to handle this in the UVM code instead of in the pmap code, but these pmap changes improve the handling of low memory situations in general, and handling this in UVM would be clunky, so this seemed like the better way to go. This somewhat indirectly fixes PR 52706 on the remaining platforms where this problem existed.
1.221.4.3	21-Apr-2020	martin	Sync with HEAD
1.221.4.2	13-Apr-2020	martin	Mostly merge changes from HEAD upto 20200411
1.221.4.1	10-Jun-2019	christos	Sync with HEAD
1.221.2.4	26-Dec-2018	pgoyette	Sync with HEAD, resolve a few conflicts
1.221.2.3	30-Sep-2018	pgoyette	Ssync with HEAD
1.221.2.2	06-Sep-2018	pgoyette	Sync with HEAD Resolve a couple of conflicts (result of the uimin/uimax changes)
1.221.2.1	28-Jul-2018	pgoyette	Sync with HEAD
1.252.2.5	29-May-2025	martin	Pull up following revision(s) (requested by bouyer in ticket #1956): sys/kern/subr_pool.c: revision 1.295 Never call pr_drain_hook from pool_allocator_alloc(). In the PR_WAITOK case it's called from pool_reclaim In the !PR_WAITOK case we're holding the pool lock and if the drain hook wants kernel_lock we may deadlock with another thread holding kernel_lock and calling pool_get(). Fixes PR kern/59411
1.252.2.4	17-Jul-2022	martin	Pull up following revision(s) (requested by simonb in ticket #1479): sys/kern/subr_pool.c: revision 1.285 Use 64-bit math to calculate pool sizes. Fixes overflow errors for pools larger than 4GB and gives the correct output for kernel pool pages in "vmstat -s" output.
1.252.2.3	08-Mar-2020	martin	Pull up following revision(s) (requested by chs in ticket #766): sys/kern/subr_pool.c: revision 1.265 fix assertions about when it is ok for pool_get() to return NULL.
1.252.2.2	01-Sep-2019	martin	Pull up following revision(s) (requested by maxv in ticket #129): sys/kern/subr_pool.c: revision 1.256 sys/kern/subr_pool.c: revision 1.257 Kernel Heap Hardening: use bitmaps on all off-page pools. This migrates 29 MI pools on amd64 from linked lists to bitmaps, which have higher security properties. Then, change the computation of the size of the PH pools: take into account the bitmap area available by default in the ph_u2 union, and don't go with &phpool[>0] if &phpool[0] already has enough space to embed a bitmap. The pools that are migrated in this change all use bitmaps small enough to fit in &phpool[0], therefore there is no increase in memory consumption. - Revert r1.254, put back \|\| for KASAN, some destructors like lwp_dtor() caused false positives. Needs more work.
1.252.2.1	18-Aug-2019	martin	Pull up following revision(s) (requested by maxv in ticket #81): sys/kern/subr_pool.c: revision 1.253 sys/kern/subr_pool.c: revision 1.254 sys/kern/subr_pool.c: revision 1.255 Kernel Heap Hardening: perform certain sanity checks on the pool caches directly, to immediately detect certain bugs that would otherwise have been detected only later on the pool layer, if the buffer ever reached the pool layer. - Replace \|\| by && in KASAN, to increase the pool coverage. Strictly speaking, what we want to avoid is poisoning buffers that were referenced in a global list as part of the ctor. But, if a buffer indeed got referenced as part of the ctor, it necessarily has to be unreferenced in the dtor; which implies it has to have a dtor. So we want both a ctor and a dtor, and not just one of them. Note that POOL_QUARANTINE already implicitly provides this increased coverage. - Initialize pp->pr_redzone to false. For some reason with KUBSAN GCC does not eliminate the unused branch in pr_item_linkedlist_put(), and this leads to a unused uninitialized access which triggers KUBSAN messages.
1.264.2.2	29-Feb-2020	ad	Sync with head.
1.264.2.1	25-Jan-2020	ad	Sync with head.
1.266.4.1	20-Apr-2020	bouyer	Sync with HEAD
1.274.2.2	03-Apr-2021	thorpej	Sync with HEAD.
1.274.2.1	03-Jan-2021	thorpej	Sync w/ HEAD.
1.276.4.1	01-Aug-2021	thorpej	Sync with HEAD.
1.285.4.3	29-May-2025	martin	Pull up following revision(s) (requested by bouyer in ticket #1122): sys/kern/subr_pool.c: revision 1.295 Never call pr_drain_hook from pool_allocator_alloc(). In the PR_WAITOK case it's called from pool_reclaim In the !PR_WAITOK case we're holding the pool lock and if the drain hook wants kernel_lock we may deadlock with another thread holding kernel_lock and calling pool_get(). Fixes PR kern/59411
1.285.4.2	15-Dec-2024	martin	Pull up following revision(s) (requested by chs in ticket #1028): sys/kern/subr_pool.c: revision 1.292 pool: fix pool_sethiwat() to actually do something The change that I made to the pool code back in April 2020 ("slightly change and fix the semantics of pool_set*wat()" ...) accidental broke pool_sethiwat() by making it have no effect. This was discovered after the crash reported in PR 58666 was fixed. The same machine (32-bit, with 10GB RAM) would hang due to the buffer cache causing the system to run out of kernel virtual space. The buffer cache uses a separate pool for buffer data for each power of 2 between DEV_BSIZE and MAXBSIZE, and if the usage pattern of buffer sizes changes then memory has to be moved between the different pools in order to create buffers of the new size. The buffer cache handles this by using pool_sethiwat() to cause memory freed from the buffer cache back to the pools to not be cached in the buffer cache pools but instead be freed back to the pools' back-end allocator (which allocates from the low-level kva allocator) as soon as possible. But since pool_sethiwat() wasn't doing anything, memory would stay cached in some buffer cache pools and starve other buffer cache pools (and a few other pools that do no use the kmem layer for memory allocation). Fix pool_sethiwat() to do what it is supposed to do again.
1.285.4.1	20-Sep-2024	martin	Pull up following revision(s) (requested by rin in ticket #871): sys/kern/subr_pool.c: revision 1.286 Avoid undefined behaviour.

OpenGrok