History log of /src/sys/kern/subr_pool.c |
Revision | | Date | Author | Comments |
1.295 |
| 26-May-2025 |
bouyer | Never call pr_drain_hook from pool_allocator_alloc(). In the PR_WAITOK case it's called from pool_reclaim In the !PR_WAITOK case we're holding the pool lock and if the drain hook wants kernel_lock we may deadlock with another thread holding kernel_lock and calling pool_get(). Fixes PR kern/59411
|
1.294 |
| 16-May-2025 |
bouyer | Revert previous, requested by riastradh@ One possible fix for kern/59411 makes PR_GROWINGNOWAIT usefull again.
|
1.293 |
| 09-May-2025 |
bouyer | pool_grow(): The thread setting PR_GROWINGNOWAIT holds the pr_lock and should not release it before clearing PR_GROWINGNOWAIT because it's called with !PR_WAITOK. No other thread should see PR_GROWINGNOWAIT while holding pr_lock, so PR_GROWINGNOWAIT looks useless and can probably be removed. For now, only KASSERT that PR_GROWINGNOWAIT is never seeen, to make sure. Note that in the PR_GROWINGNOWAIT case we would exit/reenter pr_lock while we don't have PR_WAITOK, which is probably wrong too.
|
1.292 |
| 07-Dec-2024 |
chs | pool: fix pool_sethiwat() to actually do something
The change that I made to the pool code back in April 2020 ("slightly change and fix the semantics of pool_set*wat()" ...) accidental broke pool_sethiwat() by making it have no effect.
This was discovered after the crash reported in PR 58666 was fixed. The same machine (32-bit, with 10GB RAM) would hang due to the buffer cache causing the system to run out of kernel virtual space. The buffer cache uses a separate pool for buffer data for each power of 2 between DEV_BSIZE and MAXBSIZE, and if the usage pattern of buffer sizes changes then memory has to be moved between the different pools in order to create buffers of the new size. The buffer cache handles this by using pool_sethiwat() to cause memory freed from the buffer cache back to the pools to not be cached in the buffer cache pools but instead be freed back to the pools' back-end allocator (which allocates from the low-level kva allocator) as soon as possible. But since pool_sethiwat() wasn't doing anything, memory would stay cached in some buffer cache pools and starve other buffer cache pools (and a few other pools that do no use the kmem layer for memory allocation).
Fix pool_sethiwat() to do what it is supposed to do again.
|
1.291 |
| 07-Dec-2024 |
chs | pool: use "big" (ie. > PAGE_SIZE) default allocators for more cases
When I added the default "big" pool allocators back in 2017, I added them only for pool_caches and not plain pools, and only for IPL_NONE pool_caches at that. But these allocators work fine for for all pool caches and plain pools as well, so use them automatically by default when needed for all of those cases.
|
1.290 |
| 09-Apr-2023 |
riastradh | pool(9): Tweak branch prediction in pool_cache_get_paddr assertion.
No functional change intended.
|
1.289 |
| 09-Apr-2023 |
riastradh | pool(9): Simplify assertion in pool_update_curpage.
Add message while here.
|
1.288 |
| 09-Apr-2023 |
riastradh | kern: KASSERT(A && B) -> KASSERT(A); KASSERT(B)
|
1.287 |
| 24-Feb-2023 |
riastradh | kern: Eliminate most __HAVE_ATOMIC_AS_MEMBAR conditionals.
I'm leaving in the conditional around the legacy membar_enters (store-before-load, store-before-store) in kern_mutex.c and in kern_lock.c because they may still matter: store-before-load barriers tend to be the most expensive kind, so eliding them is probably worthwhile on x86. (It also may not matter; I just don't care to do measurements right now, and it's a single valid and potentially justifiable use case in the whole tree.)
However, membar_release/acquire can be mere instruction barriers on all TSO platforms including x86, so there's no need to go out of our way with a bad API to conditionalize them. If the procedure call overhead is measurable we just could change them to be macros on x86 that expand into __insn_barrier.
Discussed on tech-kern: https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html
|
1.286 |
| 17-Feb-2023 |
skrll | Avoid undefined behaviour.
|
1.285 |
| 16-Jul-2022 |
simonb | branches: 1.285.4; Use 64-bit math to calculate pool sizes. Fixes overflow errors for pools larger than 4GB and gives the correct output for kernel pool pages in "vmstat -s" output.
|
1.284 |
| 29-May-2022 |
andvar | fix various typos in comments and log messages.
|
1.283 |
| 24-May-2022 |
andvar | fix various typos in comments, docs and log messages.
|
1.282 |
| 09-Apr-2022 |
riastradh | pool(9): Convert membar_exit to membar_release.
|
1.281 |
| 27-Feb-2022 |
riastradh | pool(9): Membar audit.
- Use atomic_store_release and atomic_load_consume for associating a freshly constructed pool_cache with its underlying pool. The pool gets published in various ways before the pool cache is fully constructed.
=> Nix membar_sync -- no store-before-load is needed here.
- Take pool_head_lock around sysctl kern.pool TAILQ_FOREACH. Then take a reference count, and drop the lock, around copyout.
=> Otherwise, pools could be partially initialized or freed while we're still trying to read from them -- and in the worst case, we might see a corrupted view of the tailq.
=> If we kept the lock around copyout, this could deadlock in memory allocation.
=> If we didn't take a reference count while releasing the lock, the pool could be destroyed while we're trying to traverse the list, sending us into oblivion instead of the next element.
|
1.280 |
| 24-Dec-2021 |
riastradh | pool(9): Fix default PR_NOALIGN for large pool caches.
Was broken in recent change to separate some pool cache flags from pool flags.
Fixes crash in zfs.
|
1.279 |
| 22-Dec-2021 |
thorpej | Do the last change differently:
Instead of having a pre-destruct hook, put knowledge of passive serialization into the pool allocator directly, enabled by PR_PSERIALIZE when the pool / pool_cache is initialized. This will guarantee that a passive serialization barrier will be performed before the object's destructor is called, or before the page containing the object is freed back to the system (in the case of no destructor). Note that the internal allocator overhead is different when PR_PSERIALIZE is used (it implies PR_NOTOUCH, because the objects must remain in a valid state).
In the DRM Linux API shim, this allows us to remove the custom page allocator for SLAB_TYPESAFE_BY_RCU.
|
1.278 |
| 21-Dec-2021 |
thorpej | Add pool_cache_setpredestruct(), which allows a pool cache to specify a function to be called before the destructor for a batch of one or more objects is called. This can be used as a synchronization point by subsystems that rely on the type-stable nature of pool cache objects or subsystems that use other forms of passive serialization.
|
1.277 |
| 25-Jul-2021 |
simonb | Add accessor functions to get the number of gets and puts on pools and pool caches.
|
1.276 |
| 24-Feb-2021 |
mrg | branches: 1.276.4; skip redzone on pools with the allocation (including all overhead) on anything greater than half the pool pagesize.
this stops 4KiB being used per allocation from the kmem-02048 pool, and 64KiB per allocation from the buf32k pool.
we're still wasting 1/4 of space for overhead on eg, the buf1k or kmem-01024 pools. however, including overhead costs, the amount of useless space (not used by consumer or overhead) reduces from 47% to 18%, so this is far less bad overall.
there are a couple of ideas on solving this less ugly:
- pool redzones are enabled with DIAGNOSTIC kernels, which is defined as being "fast, cheap". this is not cheap (though it is relatively fast if you don't run out of memory) so it does not really belong here as is, but DEBUG or a special option would work for it.
- if we increase the "pool page" size for these pools, such that the overhead over pool page is reduced to 5% or less, we can have redzones for more allocations without using more space.
also, see this thread:
https://mail-index.netbsd.org/tech-kern/2021/02/23/msg027130.html
|
1.275 |
| 19-Dec-2020 |
mrg | ddb: add two new modifiers to "show pool" and "show all pools"
- /s shows a short single-line per pool list (the normal output is about 10 lines per.) - /S skips pools with zero allocations.
|
1.274 |
| 05-Sep-2020 |
riastradh | branches: 1.274.2; Suppress pool redzone message unless booted with debug.
|
1.273 |
| 19-Jun-2020 |
jdolecek | bump the limit on max item size for pool_init()/pool_cache_init() up to 1 << 24, so that the pools can be used for ZFS block allocations, which are up to SPA_MAXBLOCKSHIFT (1 << 24)
part of PR kern/55397 by Frank Kardel
|
1.272 |
| 14-Jun-2020 |
ad | Arithmetic error in previous.
|
1.271 |
| 14-Jun-2020 |
ad | pool_cache:
- make all counters per-CPU and make cache layer do its work with atomic ops. - conserve memory by caching empty groups globally.
|
1.270 |
| 07-Jun-2020 |
maxv | Add fault(4).
|
1.269 |
| 06-Jun-2020 |
maxv | kMSan: re-set the orig after pool_cache_get_slow(), using the address of the caller of pool_cache_get_paddr().
Otherwise the orig is just pool_cache_get_paddr(), and that's not really useful for debugging.
|
1.268 |
| 15-Apr-2020 |
maxv | Introduce POOL_NOCACHE, simple option to cancel pool_caches and go directly to the pool layer. It is taken out of POOL_QUARANTINE.
Advertise POOL_NOCACHE for kMSan rather than POOL_QUARANTINE. With kMSan we are only interested in the no-caching effect, not the quarantine. This reduces memory pressure on kMSan kernels.
|
1.267 |
| 13-Apr-2020 |
chs | slightly change and fix the semantics of pool_set*wat(), pool_sethardlimit() and pool_prime() (and their pool_cache_* counterparts):
- the pool_set*wat() APIs are supposed to specify thresholds for the count of free items in the pool before pool pages are automatically allocated or freed during pool_get() / pool_put(), whereas pool_sethardlimit() and pool_prime() are supposed to specify minimum and maximum numbers of total items in the pool (both free and allocated). these were somewhat conflated in the existing code, so separate them as they were intended.
- change pool_prime() to take an absolute number of items to preallocate rather than an increment over whatever was done before, and wait for any memory allocations to succeed. since pool_prime() can no longer fail after this, change its return value to void and adjust all callers.
- pool_setlowat() is documented as not immediately attempting to allocate any memory, but it was changed some time ago to immediately try to allocate up to the lowat level, so just fix the manpage to describe the current behaviour.
- add a pool_cache_prime() to complete the API set.
|
1.266 |
| 08-Feb-2020 |
maxv | branches: 1.266.4; Retire KLEAK.
KLEAK was a nice feature and served its purpose; it allowed us to detect dozens of info leaks on the kernel->userland boundary, and thanks to it we tackled a good part of the infoleak problem 1.5 years ago.
Nowadays however, we have kMSan, which can detect uninitialized memory in the kernel. kMSan supersedes KLEAK: it can detect what KLEAK was able to detect, but in addition, (1) it operates in all of the kernel and not just the kernel->userland boundary, (2) it requires no user interaction, and (3) it is deterministic and not statistical.
That makes kMSan the feature of choice to detect info leaks nowadays; people interested in detecting info leaks should boot a kMSan kernel and just wait for the magic to happen.
KLEAK was a good ride, and a fun project, but now is time for it to go.
Discussed with several people, including Thomas Barabosch.
|
1.265 |
| 19-Jan-2020 |
chs | fix assertions about when it is ok for pool_get() to return NULL.
|
1.264 |
| 27-Dec-2019 |
maxv | branches: 1.264.2; Switch to panic, and make the message more useful.
|
1.263 |
| 03-Dec-2019 |
riastradh | Use __insn_barrier to enforce ordering in l_ncsw loops.
(Only need ordering observable by interruption, not by other CPUs.)
|
1.262 |
| 14-Nov-2019 |
maxv | Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized memory used by the kernel at run time, and just like kASan and kCSan, it is an excellent feature. It has already detected 38 uninitialized variables in the kernel during my testing, which I have since discreetly fixed.
We use two shadows: - "shad", to track uninitialized memory with a bit granularity (1:1). Each bit set to 1 in the shad corresponds to one uninitialized bit of real kernel memory. - "orig", to track the origin of the memory with a 4-byte granularity (1:1). Each uint32_t cell in the orig indicates the origin of the associated uint32_t of real kernel memory.
The memory consumption of these shadows is consequent, so at least 4GB of RAM is recommended to run kMSan.
The compiler inserts calls to specific __msan_* functions on each memory access, to manage both the shad and the orig and detect uninitialized memory accesses that change the execution flow (like an "if" on an uninitialized variable).
We mark as uninit several types of memory buffers (stack, pools, kmem, malloc, uvm_km), and check each buffer passed to copyout, copyoutstr, bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory that leaves the system. This allows us to detect kernel info leaks in a way that is more efficient and also more user-friendly than KLEAK.
Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot tolerate having one non-instrumented function, because this could cause false positives. kMSan cannot instrument ASM functions, so I converted most of them to __asm__ inlines, which kMSan is able to instrument. Those that remain receive special treatment.
Contrary to kASan again, kMSan uses a TLS, so we must context-switch this TLS during interrupts. We use different contexts depending on the interrupt level.
The orig tracks precisely the origin of a buffer. We use a special encoding for the orig values, and pack together in each uint32_t cell of the orig: - a code designating the type of memory (Stack, Pool, etc), and - a compressed pointer, which points either (1) to a string containing the name of the variable associated with the cell, or (2) to an area in the kernel .text section which we resolve to a symbol name + offset.
This encoding allows us not to consume extra memory for associating information with each cell, and produces a precise output, that can tell for example the name of an uninitialized variable on the stack, the function in which it was pushed on the stack, and the function where we accessed this uninitialized variable.
kMSan is available with LLVM, but not with GCC.
The code is organized in a way that is similar to kASan and kCSan, so it means that other architectures than amd64 can be supported.
|
1.261 |
| 16-Oct-2019 |
christos | Add and use __FPTRCAST, requested by uwe@
|
1.260 |
| 16-Oct-2019 |
christos | Add void * function pointer casts. There are different ways to "fix" those warnings: 1. this one: add a void * cast (which I think is the least intrusive) 2. add pragmas to elide the warning 3. add intermediate inline conversion functions 4. change the called function prototypes, adding unused arguments and converting some of the pointer arguments to void *. 5. make the functions varyadic (which defeats the purpose of checking) 6. pass command line flags to elide the warning I did try 3 and 4 and I was not pleased with the result (sys_ptrace_common.c) (3) added too much code and defines, and (4) made the regular use clumsy.
|
1.259 |
| 23-Sep-2019 |
skrll | Enable POOL_REDZONE with DIAGNOSTIC.
The bug in the arm pmap was fixed long ago.
|
1.258 |
| 06-Sep-2019 |
maxv | Reorder for clarity, and localify pool_allocator_big[], should not be used outside.
|
1.257 |
| 26-Aug-2019 |
maxv | Revert r1.254, put back || for KASAN, some destructors like lwp_dtor() caused false positives. Needs more work.
|
1.256 |
| 17-Aug-2019 |
maxv | Kernel Heap Hardening: use bitmaps on all off-page pools. This migrates 29 MI pools on amd64 from linked lists to bitmaps, which have higher security properties.
Then, change the computation of the size of the PH pools: take into account the bitmap area available by default in the ph_u2 union, and don't go with &phpool[>0] if &phpool[0] already has enough space to embed a bitmap.
The pools that are migrated in this change all use bitmaps small enough to fit in &phpool[0], therefore there is no increase in memory consumption.
|
1.255 |
| 16-Aug-2019 |
maxv | Initialize pp->pr_redzone to false. For some reason with KUBSAN GCC does not eliminate the unused branch in pr_item_linkedlist_put(), and this leads to a unused uninitialized access which triggers KUBSAN messages.
|
1.254 |
| 03-Aug-2019 |
maxv | Replace || by && in KASAN, to increase the pool coverage.
Strictly speaking, what we want to avoid is poisoning buffers that were referenced in a global list as part of the ctor. But, if a buffer indeed got referenced as part of the ctor, it necessarily has to be unreferenced in the dtor; which implies it has to have a dtor. So we want both a ctor and a dtor, and not just one of them.
Note that POOL_QUARANTINE already implicitly provides this increased coverage.
|
1.253 |
| 02-Aug-2019 |
maxv | Kernel Heap Hardening: perform certain sanity checks on the pool caches directly, to immediately detect certain bugs that would otherwise have been detected only later on the pool layer, if the buffer ever reached the pool layer.
|
1.252 |
| 29-Jun-2019 |
maxv | branches: 1.252.2; The big pool allocators use pool_page_alloc(), which allocates page-aligned storage. So if we switch to a big pool, set PR_NOALIGN, because the address of the storage is not aligned to the item size.
Should fix PR/54319.
|
1.251 |
| 13-Jun-2019 |
christos | make pool assertion messages consistent.
|
1.250 |
| 09-May-2019 |
skrll | Avoid KASSERT(!cpu_intr_p()) when breaking into ddb and issuing
show uvmexp
|
1.249 |
| 13-Apr-2019 |
maxv | Introduce POOL_QUARANTINE, a feature that creates a window during which a freed buffer cannot be reallocated. This greatly helps detecting use-after-frees, because they are not short-lived anymore.
We maintain a per-pool fifo of 128 buffers. On each pool_put, we do a real free of the oldest buffer, and insert the new buffer. Before insertion, we mark the buffer as invalid with KASAN. On each pool_cache_put, we destruct the object, so it lands in pool_put, and the quarantine is handled there.
POOL_QUARANTINE can be used in conjunction with KASAN to detect more use-after-free bugs.
|
1.248 |
| 07-Apr-2019 |
maxv | Provide a code argument in kasan_mark(), and give a code to each caller. Five codes used: GenericRedZone, MallocRedZone, KmemRedZone, PoolRedZone, and PoolUseAfterFree.
This can greatly help debugging complex memory corruptions.
|
1.247 |
| 07-Apr-2019 |
maxv | Fix tiny race in pool+KASAN, that resulted in occasional false positives.
We were uselessly marking already valid areas as valid. When doing that, our KASAN code emits two calls to kasan_markmem, and there is a very small window where the area becomes invalid. So, if the area happens to be already globally referenced, and if another thread happens to read the buffer via this reference, we get a false positive.
This happens only with pool_caches that have a pc_ctor that creates a global reference to the buffer, and there is one single pool_cache that does that: 'file_cache'.
So now, two changes:
- In pool_cache_get_slow(), the pool_get() has already redzoned the object, so no need to call pool_redzone_fill().
- In pool_cache_destruct_object1(), don't re-mark the object. If there is no ctor pool_put is fine with already-invalid objects, if there is a ctor the object was not marked as invalid in the first place; so in either case, the re-marking is not needed.
Fixes PR/53674. Although very rare and difficult to reproduce, a local quarantine patch of mine made the false positives recurrent.
|
1.246 |
| 28-Mar-2019 |
maxv | Move pnbuf_cache into vfs_init.c, where it belongs.
|
1.245 |
| 27-Mar-2019 |
maxv | Kernel Heap Hardening: detect frees-in-wrong-pool on on-page pools. The detection is already implicitly done for off-page pools.
We recycle pr_slack (unused) in struct pool, and make ph_node a union in order to recycle an unsigned int in struct pool_item_header. Each time a pool is created we atomically increase a global counter, and register the current value in pp. We then propagate this value in each ph, and ensure they match in pool_put.
This can catch several classes of kernel bugs and basically makes them unexploitable. It comes with no increase in memory usage and no measurable increase in CPU cost (inexistent cost actually, just one check predicted false).
|
1.244 |
| 26-Mar-2019 |
maxv | Remove POOL_SUBPAGE, it is unused, undocumented, and adds confusion.
|
1.243 |
| 18-Mar-2019 |
maxv | Kernel Heap Hardening: manage freed items with bitmaps rather than linked lists when we're on-page and the page header is naturally big enough to contain a bitmap.
This comes with no increase in memory consumption, and similar CPU cost (maybe it's a little faster actually).
We want to favor bitmaps over linked lists, because linked lists install kernel pointers inside the items, and this can be too easily exploitable in use-after-free or double-free conditions, or in item buffer overflows occurring within a pool page.
|
1.242 |
| 17-Mar-2019 |
maxv | Introduce a new flag, PR_USEBMAP, that indicates whether the pool uses a bitmap to manage freed items. It dissociates PR_NOTOUCH from bitmaps, but for now is set only when PR_NOTOUCH is set, which reproduces the current behavior. Therefore, no functional change. Also clarify the code.
|
1.241 |
| 17-Mar-2019 |
maxv | Kernel Heap Hardening: put the pool header at the beginning of the backing page, not at the end of it.
This makes it harder to exploit buffer overflows, because it eliminates the certainty that sensitive kernel data is located after the item space and is therefore overwritable.
The pr_itemoffset field is recycled, and holds the (aligned) offset of the item space. The pr_phoffset field becomes unused. We align 'itemspace' for clarity, but it's not strictly necessary.
This comes with no performance cost or increase in memory usage, in particular the potential padding consumed by roundup(PHSIZE, align) was already implicitly consumed before, because of the (necessary) truncations in the divisions. Now it's just more explicit, but not bigger.
|
1.240 |
| 17-Mar-2019 |
maxv | Move some code into a separate function, and explain a bit. Also define PHSIZE. No functional change.
|
1.239 |
| 17-Mar-2019 |
maxv | cosmetic
|
1.238 |
| 17-Mar-2019 |
maxv | Prepare the removal of the 'ioff' argument: add a KASSERT to ensure it is zero, and remove the internal logic. The pool code is simpler now.
|
1.237 |
| 16-Mar-2019 |
maxv | Misc changes:
- Turn two KASSERTs to real panics, they are useful and not expensive. - Rename a few variables for clarity. - Add a new panic, to make sure a freed item is in the item space.
|
1.236 |
| 13-Mar-2019 |
maxv | style
|
1.235 |
| 11-Mar-2019 |
maxv | Add sanity check: make sure we retrieve a valid item header, by checking its page address against the one we computed. If there's a mismatch it means the buffer does not belong to the pool, and we panic.
|
1.234 |
| 11-Mar-2019 |
maxv | Rename pr_item_notouch_* to pr_item_bitmap_*, and move some code into new pr_item_linkedlist_* functions. This makes it easier to see that we have two ways of handling freed items.
No functional change.
|
1.233 |
| 11-Feb-2019 |
maxv | Fix previous, pr_size includes the KASAN redzone. Repurpose pr_reqsize and use it for PR_ZERO, it holds the size requested by the user with no padding or redzone added, and only these bytes should be zeroed.
|
1.232 |
| 10-Feb-2019 |
christos | Introduce PR_ZERO to avoid open-coding memset()s everywhere. OK riastradh@.
|
1.231 |
| 23-Dec-2018 |
maxv | Simplify the KASAN API, use only kasan_mark() and explain briefly. The alloc/free naming was too confusing.
|
1.230 |
| 23-Dec-2018 |
maxv | Remove useless debugging code, the area is completely filled but it's not checked afterwards, only pi_magic is.
|
1.229 |
| 16-Dec-2018 |
maxv | Add support for detecting use-after-frees in KASAN. We poison each freed buffer, any subsequent read or write will be detected as illegal.
* Add POOL_CHECK_MAGIC, which is disabled under KASAN, because the same detection is done in a better way.
* Register the size+redzone in the pool structure, to reduce the overhead.
* Fix the CTOR/DTOR check in KLEAK, the fields are never NULL.
|
1.228 |
| 02-Dec-2018 |
maxv | Introduce KLEAK, a new feature that can detect kernel information leaks.
It works by tainting memory sources with marker values, letting the data travel through the kernel, and scanning the kernel<->user frontier for these marker values. Combined with compiler instrumentation and rotation of the markers, it is able to yield relevant results with little effort.
We taint the pools and the stack, and scan copyout/copyoutstr. KLEAK is supported on amd64 only for now, but it is not complicated to add more architectures (just a matter of having the address of .text, and a stack unwinder).
A userland tool is provided, that allows to execute a command in rounds and monitor the leaks generated all the while.
KLEAK already detected directly 12 kernel info leaks, and prompted changes that in total fixed 25+ leaks.
Based on an idea developed jointly with Thomas Barabosch (of Fraunhofer FKIE).
|
1.227 |
| 10-Sep-2018 |
maxv | Correctly align the size+redzone for KASAN, on amd64 it happens to be always 8byte-aligned but on other architectures it may not be.
|
1.226 |
| 25-Aug-2018 |
maxv | Disable POOL_REDZONE until we figure out what's wrong. There must be a dumb problem, that is not triggerable on amd64.
|
1.225 |
| 24-Aug-2018 |
maxv | Use __predict_false to optimize, and also replace panic->printf.
|
1.224 |
| 23-Aug-2018 |
maxv | Add kASan redzones on pools and pool_caches. Also enable POOL_REDZONE on DIAGNOSTIC.
|
1.223 |
| 04-Jul-2018 |
kamil | Avoid undefined behavior in pr_item_notouch_put()
Do not left shift a signed integer changing its signedness bit.
sys/kern/subr_pool.c:251:30, left shift of 1 by 31 places cannot be represented in type 'int'
Detected with Kernel Undefined Behavior Sanitizer.
Reported by <Harry Pantazis>
|
1.222 |
| 04-Jul-2018 |
kamil | Avoid Undefined Behavior in pr_item_notouch_get()
Change the type of left shifted integer from signed to unsigned.
sys/kern/subr_pool.c:274:13, left shift of 1 by 31 places cannot be represented in type 'int'
Detected with Kernel Undefined Behavior Sanitizer.
Reported by <Harry Pantazis>
|
1.221 |
| 12-Jan-2018 |
para | branches: 1.221.2; 1.221.4; fix comment
pool stats are listed 'vmstat -m' not 'vmstat -i'
|
1.220 |
| 29-Dec-2017 |
christos | Don't release the lock in the PR_NOWAIT allocation. Move flags setting after the acquiring the mutex. (from Tobias Nygren)
|
1.219 |
| 16-Dec-2017 |
mrg | hopefully workaround the irregularly "fork fails in init" problem.
if a pool is growing, and the grower is PR_NOWAIT, mark this. if another caller wants to grow the pool and is also PR_NOWAIT, busy-wait for the original caller, which should either succeed or hard-fail fairly quickly.
implement the busy-wait by unlocking and relocking this pools mutex and returning ERESTART. other methods (such as having the caller do this) were significantly more code and this hack is fairly localised.
ok chs@ riastradh@
|
1.218 |
| 04-Dec-2017 |
mrg | properly account PR_RECURSIVE pools like vmstat does.
|
1.217 |
| 02-Dec-2017 |
mrg | add two new members to uvmexp_sysctl{}: bootpages and poolpages. bootpages is set to the pages allocated via uvm_pageboot_alloc(). poolpages is calculated from the list of pools nr_pages members.
this brings us closer to having a valid total of pages known by the system, vs actual pages originally managed.
XXX: poolpages needs some handling for PR_RECURSIVE pools still.
|
1.216 |
| 14-Nov-2017 |
christos | - fix an assert; we can reach there if we are nowait or limitfail. - when priming the pool and failing with ERESTART, don't decrement the number of pages; this avoids the issue of returning an ERESTART when we get to 0, and is more correct. - simplify the pool_grow code, and don't wakeup things if we ENOMEM.
|
1.215 |
| 09-Nov-2017 |
christos | Add assertions that either PR_WAITOK or PR_NOWAIT are set.
|
1.214 |
| 09-Nov-2017 |
christos | Handle the ERESTART case from pool_grow()
|
1.213 |
| 09-Nov-2017 |
christos | make the KASSERTMSG/panic strings consistent as '%s: [%s], __func__, wchan'
|
1.212 |
| 09-Nov-2017 |
christos | Since pr_lock is now used to wait for two things now (PR_GROWING and PR_WANTED) we need to loop for the condition we wanted.
|
1.211 |
| 06-Nov-2017 |
riastradh | Assert that pool_get failure happens only with PR_NOWAIT.
This would have caught the mistake I made last week leading to null pointer dereferences all over the place, a mistake which I evidently poorly scheduled alongside maxv's change to the panic message on x86 for null pointer dereferences.
|
1.210 |
| 05-Nov-2017 |
mlelstv | pool_grow can now fail even when sleeping is ok. Catch this case in pool_get and retry.
|
1.209 |
| 28-Oct-2017 |
riastradh | Allow only one pending call to a pool's backing allocator at a time.
Candidate fix for problems with hanging after kva fragmentation related to PR kern/45718.
Proposed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html
Tested by bouyer@ on i386.
This makes one small change to the semantics of pool_prime and pool_setlowat: they may fail with EWOULDBLOCK instead of ENOMEM, if there is a pending call to the backing allocator in another thread but we are not actually out of memory. That is unlikely because nearly always these are used during initialization, when the pool is not in use.
XXX pullup-8 XXX pullup-7 XXX pullup-6 (requires tweaking the patch) XXX pullup-5...
|
1.208 |
| 08-Jun-2017 |
chs | add some pool_allocators for pool item sizes larger than PAGE_SIZE. needed by dtrace.
|
1.207 |
| 14-Mar-2017 |
riastradh | branches: 1.207.6; #if DIAGNOSTIC panic ---> KASSERT
- Omit mutex_exit before panic. No need. - Sprinkle some more information into a few messages. - Prefer __diagused over #if DIAGNOSTIC for declarations, to reduce conditionals.
ok mrg@
|
1.206 |
| 05-Feb-2016 |
knakahara | branches: 1.206.2; 1.206.4; fix: "vmstat -C" CpuLayer showed only the last cpu values.
|
1.205 |
| 24-Aug-2015 |
pooka | to garnish, dust with _KERNEL_OPT
|
1.204 |
| 28-Jul-2015 |
maxv | Introduce POOL_REDZONE.
|
1.203 |
| 13-Jun-2014 |
joerg | branches: 1.203.2; 1.203.4; Add kern.pool for memory pool stats.
|
1.202 |
| 26-Apr-2014 |
abs | Ensure pool_head is non static - for "vmstat -i"
|
1.201 |
| 17-Feb-2014 |
para | branches: 1.201.2; replace vmem(9) custom boundary tag allocation with a pool(9)
|
1.200 |
| 11-Mar-2013 |
pooka | branches: 1.200.6; In pool_cache_put_slow(), pool_get() can block (it does mutex_enter()), so we need to retry if curlwp took a context switch during the call. Otherwise, CPU-local invariants can get screwed up:
panic: kernel diagnostic assertion "cur->pcg_avail == cur->pcg_size" failed
This is (was) very easy to reproduce by just running:
while : ; do RUMP_NCPU=32 ./a.out ; done
where a.out only calls rump_init(). But, any situation there's contention and a pool doesn't have emptygroups would do.
|
1.199 |
| 09-Feb-2013 |
christos | printflike maintenance.
|
1.198 |
| 28-Aug-2012 |
christos | branches: 1.198.2; proper locking for DEBUG
|
1.197 |
| 05-Jun-2012 |
jym | Now that pool_cache_invalidate() is synchronous and can handle per-CPU caches, merge together pool_drain_start() and pool_drain_end() into
bool pool_drain(struct pool **ppp);
"bool" value indicates whether reclaiming was fully done (true) or not (false) "ppp" will contain a pointer to the pool that was drained (optional).
See http://mail-index.netbsd.org/tech-kern/2012/06/04/msg013287.html
|
1.196 |
| 05-Jun-2012 |
jym | As pool reclaiming is unlikely to happen at interrupt or softint context, re-enable the portion of code that allows invalidation of CPU-bound pool caches.
Two reasons: - CPU cached objects being invalidated, the probability of fetching an obsolete object from the pool_cache(9) is greatly reduced. This speeds up pool_cache_get() quite a bit as it does not have to keep destroying objects until it finds an updated one when an invalidation is in progress.
- for situations where we have to ensure that no obsolete object remains after a state transition (canonical example: pmap mappings between Xen VM restoration), invalidating all pool_cache(9) is the safest way to go.
As it uses xcall(9) to broadcast the execution of pool_cache_transfer(), pool_cache_invalidate() cannot be called from interrupt or softint context (scheduling a xcall(9) can put a LWP to sleep).
pool_cache_xcall() => pool_cache_transfer() to reflect its use.
Invalidation being a costly process (1000s objects may be destroyed), all places where pool_cache_invalidate() may be called from interrupt/softint context will now get caught by the proper KASSERT(), and fixed. Ping me when you see one.
Tested under i386 and amd64 by running ATF suite within 64MiB HVM domains (tried triggering pgdaemon a few times).
No objection on tech-kern@.
XXX a similar fix has to be pulled up to NetBSD-6, but with a more conservative approach.
See http://mail-index.netbsd.org/tech-kern/2012/05/29/msg013245.html
|
1.195 |
| 05-May-2012 |
rmind | G/C POOL_DIAGNOSTIC option. No objection on tech-kern@.
|
1.194 |
| 04-Feb-2012 |
para | branches: 1.194.2; make acorn26 compile by fixing up subpage pool allocations
ok: riz@
|
1.193 |
| 29-Jan-2012 |
he | Use the same style for initialization of pool_allocator_kmem under POOL_SUBPAGE as all the other poll_allocator structs. Fixes build problem for acorn26.
|
1.192 |
| 28-Jan-2012 |
rmind | pool_page_alloc, pool_page_alloc_meta: avoid extra compare, use const. ffs_mountfs,sys_swapctl: replace memset with kmem_zalloc. sys_swapctl: move kmem_free outside the lock path. uvm_init: fix comment, remove pointless numeration of steps. uvm_map_enter: remove meflagval variable. Fix some indentation.
|
1.191 |
| 27-Jan-2012 |
para | extending vmem(9) to be able to allocated resources for it's own needs. simplifying uvm_map handling (no special kernel entries anymore no relocking) make malloc(9) a thin wrapper around kmem(9) (with private interface for interrupt safety reasons)
releng@ acknowledged
|
1.190 |
| 27-Sep-2011 |
jym | branches: 1.190.2; 1.190.6; Modify *ASSERTMSG() so they are now used as variadic macros. The main goal is to provide routines that do as KASSERT(9) says: append a message to the panic format string when the assertion triggers, with optional arguments.
Fix call sites to reflect the new definition.
Discussed on tech-kern@. See http://mail-index.netbsd.org/tech-kern/2011/09/07/msg011427.html
|
1.189 |
| 22-Mar-2011 |
pooka | pnbuf_cache is used all over the place outside of vfs, so put it in one place to avoid many definitions.
|
1.188 |
| 17-Jan-2011 |
uebayasi | Fix a conditional include.
|
1.187 |
| 17-Jan-2011 |
uebayasi | Include internal definitions (uvm/uvm.h) only where necessary.
|
1.186 |
| 03-Jun-2010 |
pooka | branches: 1.186.2; Report result of pool_reclaim() from pool_drain_end().
|
1.185 |
| 12-May-2010 |
rmind | pool_{cache_}get: improve previous diagnostic by checking for panicstr, so it wont trigger the assert while trying to dump core on crash.
|
1.184 |
| 12-May-2010 |
rmind | - Sprinkle asserts to catch calls from interrupt context on IPL_NONE pools. - Add diagnostic drain attempt.
|
1.183 |
| 25-Apr-2010 |
ad | MAXCPUS -> __arraycount
|
1.182 |
| 20-Jan-2010 |
rmind | branches: 1.182.2; 1.182.4; pool_cache_invalidate: comment out invalidation of per-CPU caches (nobody depends on it, at the moment) until we decide how to fix it (xcall(9) cannot be used from interrupt context). XXX: Perhaps implement XC_HIGHPRI.
|
1.181 |
| 03-Jan-2010 |
mlelstv | drop __predict micro optimization in pool_init for cleaner code.
|
1.180 |
| 03-Jan-2010 |
mlelstv | Pools are created way before the pool subsystem mutexes are initialized.
Ignore also pool_allocator_lock while the system is in cold state.
When the system has left cold state, uvm_init() should have also initialized the pool subsystem and the mutexes are ready to use.
|
1.179 |
| 02-Jan-2010 |
mlelstv | Move initialization of pool_allocator_lock before its first use. This failed on archs where a mutex isn't initialized to a zero value.
Defer allocation of pool log to the logging action, if allocation fails, it will be retried the next time something is logged.
Clear pool log on allocation so that ddb doesn't crash when showing so far unused log entries.
|
1.178 |
| 30-Dec-2009 |
elad | Turn PA_INITIALIZED to a reference count for the pool allocator, and once it drops to zero destroy the mutex we initialize. This fixes the problem mentioned in
http://mail-index.netbsd.org/tech-kern/2009/12/28/msg006727.html
Also remove pa_flags now that it's no longer needed.
Idea from matt@, okay matt@.
|
1.177 |
| 20-Oct-2009 |
jym | Fix a bug where on MP systems, pool_cache_invalidate(9) could be called early during boot, just after CPUs are attached but before they are marked as running.
This will result in a list of CPUs without the SPCF_RUNNING flag set, and will trigger the 'KASSERT(xc_tailp < xc_headp)' in xc_lowpri() as no cross call is issued.
Bug reported and patch tested by tron@.
See also http://mail-index.netbsd.org/tech-kern/2009/10/19/msg006293.html
|
1.176 |
| 15-Oct-2009 |
thorpej | - pool_cache_invalidate(): broadcast a cross-call to drain the per-CPU caches before draining the global cache. - pool_cache_invalidate_local(): remove.
|
1.175 |
| 08-Oct-2009 |
jym | Add pool_cache_invalidate_local() to the pool_cache(9) API, to permit per-CPU objects invalidation when cached in the pool cache.
See http://mail-index.netbsd.org/tech-kern/2009/10/05/msg006206.html .
Reviewed by bouyer@. Thanks!
|
1.174 |
| 13-Sep-2009 |
pooka | Wipe out the last vestiges of POOL_INIT with one swift stroke. In most cases, use a proper constructor. For proplib, give a local equivalent of POOL_INIT for the kernel object implementation. This way the code structure can be preserved, and a local link set is not hazardous anyway (unless proplib is split to several modules, but that'll be the day).
tested by booting a kernel in qemu and compile-testing i386/ALL
|
1.173 |
| 29-Aug-2009 |
rmind | Make pool_head static.
|
1.172 |
| 15-Apr-2009 |
yamt | pool_cache_put_paddr: add an assertion.
|
1.171 |
| 11-Nov-2008 |
ad | branches: 1.171.4; Avoid recursive mutex_enter() when the system is low on KVA. Should fix crash reported by riz on current-users.
|
1.170 |
| 15-Oct-2008 |
ad | branches: 1.170.2; 1.170.4; - Rename cpu_lookup_byindex() to cpu_lookup(). The hardware ID isn't of interest to MI code. No functional change. - Change /dev/cpu to operate on cpu index, not hardware ID. Now cpuctl shouldn't print confused output.
|
1.169 |
| 11-Aug-2008 |
yamt | make pcg_dummy const to catch bugs earlier.
|
1.168 |
| 11-Aug-2008 |
yamt | add some KASSERTs.
|
1.167 |
| 08-Aug-2008 |
skrll | Comment whitespace.
|
1.166 |
| 09-Jul-2008 |
yamt | pool_do_put: fix a pool corruption bug discovered by the recent exec_pool changes.
|
1.165 |
| 07-Jul-2008 |
yamt | branches: 1.165.2; fix pool corruption bugs in subr_pool.c 1.162.
|
1.164 |
| 04-Jul-2008 |
ad | Move an assignment later.
|
1.163 |
| 04-Jul-2008 |
ad | - Keep cache locked while allocating a cache group - later we might want to automatically tune the group sizes at run time. - Fix broken assertion. - Avoid another test+branch.
|
1.162 |
| 04-Jul-2008 |
ad | Remove a bunch of conditional branches from the pool_cache fast path.
|
1.161 |
| 31-May-2008 |
ad | branches: 1.161.2; Use __noinline.
|
1.160 |
| 28-Apr-2008 |
martin | branches: 1.160.2; Remove clause 3 and 4 from TNF licenses
|
1.159 |
| 28-Apr-2008 |
ad | Add MI code to support in-kernel preemption. Preemption is deferred by one of the following:
- Holding kernel_lock (indicating that the code is not MT safe). - Bracketing critical sections with kpreempt_disable/kpreempt_enable. - Holding the interrupt priority level above IPL_NONE.
Statistics on kernel preemption are reported via event counters, and where preemption is deferred for some reason, it's also reported via lockstat. The LWP priority at which preemption is triggered is tuneable via sysctl.
|
1.158 |
| 27-Apr-2008 |
ad | branches: 1.158.2; - Rename crit_enter/crit_exit to kpreempt_disable/kpreempt_enable. DragonflyBSD uses the crit names for something quite different. - Add a kpreempt_disabled function for diagnostic assertions. - Add inline versions of kpreempt_enable/kpreempt_disable for primitives. - Make some more changes for preemption safety to the x86 pmap.
|
1.157 |
| 24-Apr-2008 |
ad | Merge the socket locking patch:
- Socket layer becomes MP safe. - Unix protocols become MP safe. - Allows protocol processing interrupts to safely block on locks. - Fixes a number of race conditions.
With much feedback from matt@ and plunky@.
|
1.156 |
| 27-Mar-2008 |
ad | branches: 1.156.2; Replace use of CACHE_LINE_SIZE in some obvious places.
|
1.155 |
| 17-Mar-2008 |
ad | Make them compile again.
|
1.154 |
| 17-Mar-2008 |
yamt | - simplify ASSERT_SLEEPABLE. - move it from proc.h to systm.h. - add some more checks. - make it a little more lkm friendly.
|
1.153 |
| 10-Mar-2008 |
martin | Use cpu index instead of the machine dependend, not very expressive cpuid when naming user-visible kernel entities.
|
1.152 |
| 02-Mar-2008 |
yamt | pool_do_put: remove pa_starved_p check for now as it seems to cause more problems than it solves. PR/37993 from Greg A. Woods.
|
1.151 |
| 14-Feb-2008 |
yamt | branches: 1.151.2; 1.151.6; use time_uptime instead of getmicrotime() for ph_time.
|
1.150 |
| 05-Feb-2008 |
skrll | Revert previous as requested by yamt.
|
1.149 |
| 02-Feb-2008 |
skrll | Check alignment against pp->pr_align not pp->pr_alloc->pa_pagesz.
DIAGNOSTIC kernels on hppa boot again.
OK'd by ad.
|
1.148 |
| 28-Jan-2008 |
yamt | pool_cache_get_paddr: don't bother to clear pcgo_va unless DIAGNOSTIC.
|
1.147 |
| 04-Jan-2008 |
ad | Start detangling lock.h from intr.h. This is likely to cause short term breakage, but the mess of dependencies has been regularly breaking the build recently anyhow.
|
1.146 |
| 02-Jan-2008 |
ad | Merge vmlocking2 to head.
|
1.145 |
| 26-Dec-2007 |
ad | Merge more changes from vmlocking2, mainly:
- Locking improvements. - Use pool_cache for more items.
|
1.144 |
| 22-Dec-2007 |
yamt | pool_in_cg: don't bother to check slots past pcg_avail.
|
1.143 |
| 22-Dec-2007 |
yamt | pool_whatis: print cached items as well.
|
1.142 |
| 20-Dec-2007 |
ad | - Support two different sizes of pool_cache group. The default has 14 or 15 items, and the new large groups (for busy caches) have 62 or 63 items. - Add PR_LARGECACHE flag as a hint that a pool_cache should use large groups. This should be eventually be tuned at runtime. - Report group size for vmstat -C.
|
1.141 |
| 13-Dec-2007 |
yamt | add ddb "whatis" command. inspired from solaris ::whatis dcmd.
|
1.140 |
| 13-Dec-2007 |
yamt | don't forget to initialize ph_off for PR_NOTOUCH.
|
1.139 |
| 11-Dec-2007 |
ad | Change the ncpu test to work when a pool_cache or softint is initialized between mi_cpu_attach() and attachment of the boot CPU. Suggested by mrg@.
|
1.138 |
| 05-Dec-2007 |
ad | branches: 1.138.2; 1.138.4; pool_init, pool_cache_init: hack around IP input processing which can not yet safely block without severely confusing soo_write() and friends. If the pool's IPL is IPL_SOFTNET, initialize the mutex at IPL_VM so that it's a spinlock. To be dealt with correctly in the near future.
|
1.137 |
| 18-Nov-2007 |
ad | branches: 1.137.2; Work around issues with pool_cache on sparc.
|
1.136 |
| 14-Nov-2007 |
yamt | fix freecheck.
|
1.135 |
| 10-Nov-2007 |
yamt | for PR_NOTOUCH pool_item_header, use a bitmap rather than a freelist. it saves some space and allows more items per a page.
|
1.134 |
| 07-Nov-2007 |
ad | Merge from vmlocking:
- pool_cache changes. - Debugger/procfs locking fixes. - Other minor changes.
|
1.133 |
| 11-Oct-2007 |
ad | branches: 1.133.2; 1.133.4; Remove LOCK_ASSERT(!simple_lock_held(&foo));
|
1.132 |
| 11-Oct-2007 |
ad | Merge from vmlocking:
- G/C spinlockmgr() and simple_lock debugging. - Always include the kernel_lock functions, for LKMs. - Slightly improved subr_lockdebug code. - Keep sizeof(struct lock) the same if LOCKDEBUG.
|
1.131 |
| 18-Aug-2007 |
ad | branches: 1.131.2; 1.131.4; pool_drain: add a comment.
|
1.130 |
| 18-Aug-2007 |
ad | pool_do_cache_invalidate_grouplist: drop locks while calling the destructor. XXX Expensive - to be revisited.
|
1.129 |
| 12-Mar-2007 |
ad | branches: 1.129.8; 1.129.12; Pass an ipl argument to pool_init/POOL_INIT to be used when initializing the pool's lock.
|
1.128 |
| 04-Mar-2007 |
christos | branches: 1.128.2; Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
|
1.127 |
| 22-Feb-2007 |
thorpej | TRUE -> true, FALSE -> false
|
1.126 |
| 21-Feb-2007 |
thorpej | Replace the Mach-derived boolean_t type with the C99 bool type. A future commit will replace use of TRUE and FALSE with true and false.
|
1.125 |
| 09-Feb-2007 |
ad | branches: 1.125.2; Merge newlock2 to head.
|
1.124 |
| 01-Nov-2006 |
yamt | remove some __unused from function parameters.
|
1.123 |
| 12-Oct-2006 |
christos | - sprinkle __unused on function decls. - fix a couple of unused bugs - no more -Wno-unused for i386
|
1.122 |
| 03-Sep-2006 |
christos | branches: 1.122.2; 1.122.4; avoid empty else statement
|
1.121 |
| 20-Aug-2006 |
yamt | implement PR_NOALIGN. (allow unaligned pages) to be used by vmem quantum cache.
|
1.120 |
| 19-Aug-2006 |
yamt | pool_init: in the case of PR_NOTOUCH, don't bump item size to sizeof(struct pool_item).
|
1.119 |
| 21-Jul-2006 |
yamt | use ASSERT_SLEEPABLE where appropriate.
|
1.118 |
| 07-Jun-2006 |
kardel | merge FreeBSD timecounters from branch simonb-timecounters - struct timeval time is gone time.tv_sec -> time_second - struct timeval mono_time is gone mono_time.tv_sec -> time_uptime - access to time via {get,}{micro,nano,bin}time() get* versions are fast but less precise - support NTP nanokernel implementation (NTP API 4) - further reading: Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
|
1.117 |
| 25-May-2006 |
yamt | move wait points for kva from upper layers to vm_map. PR/33185 #1.
XXX there is a concern about interaction with kva fragmentation. see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html
|
1.116 |
| 15-Apr-2006 |
simonb | branches: 1.116.2; Add a DEBUG check that panics if pool_init() is called more than once on the same pool.
As discussed on tech-kern a few months ago.
|
1.115 |
| 15-Apr-2006 |
christos | Coverity CID 760: Protect against NULL deref.
|
1.114 |
| 02-Apr-2006 |
yamt | pool_grow: don't increase pr_minpages. (fix a mistake in 1.113)
|
1.113 |
| 17-Mar-2006 |
yamt | make duplicated code fragments into a function, pool_grow.
|
1.112 |
| 24-Feb-2006 |
bjh21 | branches: 1.112.2; 1.112.4; 1.112.6; Medium-sized overhaul of POOL_SUBPAGE support so that: 1: I can understand it, and 2: It works. Notable externally-visible changes are that POOL_SUBPAGE now has to be a compile-time constant, and that trying to initialise a pool whose objects are larger than POOL_SUBPAGE automatically generates a pool that doesn't use subpages.
NetBSD/acorn26 now boots multi-user again.
|
1.111 |
| 26-Jan-2006 |
christos | branches: 1.111.2; 1.111.4; PR/32631: Yves-Emmanuel JUTARD: Fix DIAGNOSTIC panic in the pool code. At the time pool_get() calls pool_catchup(), pp has been free'd but it is still in the "entered" state. The chain pool_catchup() -> pool_allocator_alloc() -> pool_reclaim() on pp fails because pp is still in the "entered" state. Call pr_leave() before calling calling pool_catchup() to avoid this.
Thanks for the excellent analysis!
|
1.110 |
| 24-Dec-2005 |
perry | branches: 1.110.2; Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
|
1.109 |
| 20-Dec-2005 |
christos | Commit temporary fix against kva starvation from yamt:
- pool_allocator_alloc: drain ourselves as well, so that pool_cache on us is drained as well. - pool_cache_put_paddr: destruct objects if underlying pool is starved. - pool_get: on kva starvation, wake up once a second and try again.
Fixes: PR/32287: Processes hang in "mclpl" PR/32330: shark kernel hangs under memory load.
|
1.108 |
| 01-Dec-2005 |
yamt | add "show all pools" command for ddb.
|
1.107 |
| 02-Nov-2005 |
yamt | pool_printit: don't keep a lock when printing info. we can't clean it up if ddb pager is quitted.
|
1.106 |
| 16-Oct-2005 |
christos | Make the grouplist invalidate function take a grouplist instead of a group. Suggested by yamt.
|
1.105 |
| 16-Oct-2005 |
christos | This is why I hate gotos: My previous change had different semantics than the original code since if fullgroups was empty and partgroups wasn't, we would not clean up partgroups (pointed out by yamt). Well, this one has different semantics from the original, they are the correct ones I think..
|
1.104 |
| 16-Oct-2005 |
christos | avoid a goto.
|
1.103 |
| 15-Oct-2005 |
chs | in pool_do_cache_invalidate(), make sure to process both full and partial group lists even if the first one we look at is empty. fix ddb print routine.
|
1.102 |
| 02-Oct-2005 |
chs | optimize pool_caches similarly to how I optimized pools before: split the single list of pool cache groups into three lists: completely full, partially full, and completely empty. use LIST instead of TAILQ where appropriate.
|
1.101 |
| 18-Jun-2005 |
thorpej | branches: 1.101.2; Fix some locking issues: - Make the locking rules for pr_rmpage() sane, and don't modify fields protected by the pool lock without actually holding it. - Always defer freeing the pool page to the back-end allocator, to avoid invoking the pool_allocator with the pool locked (which would violate the pool_allocator -> pool locking order). - Fix pool_reclaim() to not violate the pool_cache -> pool locking order by using a trylock.
Reviewed by Chuq Silvers.
|
1.100 |
| 01-Apr-2005 |
yamt | merge yamt-km branch. - don't use managed mappings/backing objects for wired memory allocations. save some resources like pv_entry. also fix (most of) PR/27030. - simplify kernel memory management API. - simplify pmap bootstrap of some ports. - some related cleanups.
|
1.99 |
| 01-Jan-2005 |
yamt | branches: 1.99.2; 1.99.4; 1.99.8; PR_NOTOUCH: - use uint8_t instead of uint16_t for freelist index. - set ph_off only if PR_NOTOUCH. - comment.
|
1.98 |
| 01-Jan-2005 |
yamt | in the case of !PMAP_MAP_POOLPAGE, gather pool backend allocations to large chunks for kernel_map and kmem_map to ease kva fragmentation.
|
1.97 |
| 01-Jan-2005 |
yamt | introduce a new flag for pool_init, PR_NOTOUCH. if it's specified, don't use free items as storage for internal state. so that we can use pools for non memory backed objects. inspired from solaris's KMC_NOTOUCH.
|
1.96 |
| 20-Jun-2004 |
thorpej | Remove PR_IMMEDRELEASE, since setting the high water mark will achieve the same thing.
Pointed out back in January by YAMAMOTO Takashi.
|
1.95 |
| 20-May-2004 |
atatat | Add a DIAGNOSTIC check to detect un-initialized pools.
|
1.94 |
| 25-Apr-2004 |
simonb | Initialise (most) pools from a link set instead of explicit calls to pool_init. Untouched pools are ones that either in arch-specific code, or aren't initialiased during initial system startup.
Convert struct session, ucred and lockf to pools.
|
1.93 |
| 08-Mar-2004 |
dbj | branches: 1.93.2; add splvm() around a few pa_slock and psppool calls since they may be shared with pools that can be used in interrupt context.
|
1.92 |
| 22-Feb-2004 |
enami | Modify pool page header allocation strategy as follows: In addition to current one (i.e., don't wast so large part of the page), - if the header fitsin the page without wasting any items, put it there. - don't put the header in the page if it may consume rather big item.
For example, on i386, header is now allocated in the page for the pools like fdescpl or sigapl, and allocated off the page for the pools like buf1k or buf2k.
|
1.91 |
| 16-Jan-2004 |
yamt | - fix locking order problem. (pa_slock -> pr_slock) - protect pr_phtree with pr_slock. - add some LOCK_ASSERTs.
|
1.90 |
| 09-Jan-2004 |
thorpej | Add a new pool initialization flag, PR_IMMEDRELEASE. This flag causes idle pool pages to be returned to the system immediately upon becoming de-fragmented.
Also, in pool_do_put(), don't free back an idle page unless we are over our minimum page claim.
|
1.89 |
| 29-Dec-2003 |
yamt | pool_prime_page: initialize ph_time to mono_time instead of zero as it's a mono_time relative value.
|
1.88 |
| 13-Nov-2003 |
chs | two changes in improve scalability:
(1) split the single list of pages allocated to a pool into three lists: completely full, partially full, and completely empty. there is no longer any need to traverse any list looking for a certain type of page.
(2) replace the 8-element hash table for out-of-page page headers with a splay tree.
these two changes (together with the recent enhancements to the wait code) give us linear scaling for a fork+exit microbenchmark.
|
1.87 |
| 09-Apr-2003 |
thorpej | branches: 1.87.2; Add the ability for pool caches to cache the physical address of objects. Clients of the pool_cache API must consistently use the "paddr" variants or not, otherwise behavior is undefined.
Enable this on Alpha, ARM, MIPS, and x86. Other platforms must define POOL_VTOPHYS() in the appropriate manner in order to enable the feature.
Part 1 of a series of simple patches contributed by Wasabi Systems to improve network performance.
|
1.86 |
| 16-Mar-2003 |
matt | Only define POOL_LOGSIZE/pool_size if POOL_DIAGNOSTIC is defined.
|
1.85 |
| 23-Feb-2003 |
pk | Use splvm() instead of splhigh() when accessing the internal page header pool.
|
1.84 |
| 18-Jan-2003 |
thorpej | Merge the nathanw_sa branch.
|
1.83 |
| 24-Nov-2002 |
scw | Quell uninitialised variable warnings.
|
1.82 |
| 09-Nov-2002 |
thorpej | Fix signed/unsigned comparison warnings.
|
1.81 |
| 08-Nov-2002 |
enami | Parse the modifier of ddb command as documented.
|
1.80 |
| 27-Sep-2002 |
provos | remove trailing \n in panic(). approved perry.
|
1.79 |
| 25-Aug-2002 |
thorpej | Fix signed/unsigned comparison warnings from GCC 3.3.
|
1.78 |
| 30-Jul-2002 |
thorpej | Bring down a fix from the "newlock" branch, slightly modified: * In pool_prime_page(), assert that the object being placed onto the free list meets the alignment constraints (that "ioff" within the object is aligned to "align"). * In pool_init(), round up the object size to the alignment value (or ALIGN(1), if no special alignment is needed) so that the above invariant holds true.
|
1.77 |
| 11-Jul-2002 |
matt | Add wchan to a panic (must have NOWAIT).
|
1.76 |
| 13-Mar-2002 |
simonb | branches: 1.76.4; 1.76.6; Move 'struct pool_cache_group' definition into <sys/pool.h>
|
1.75 |
| 13-Mar-2002 |
simonb | Remove two instances of an "error" variable that is only ever assigned to but not used.
|
1.74 |
| 09-Mar-2002 |
thorpej | branches: 1.74.2; Put back pool_prime(); the i386 mp pmap uses it.
|
1.73 |
| 09-Mar-2002 |
thorpej | Fix a couple of typos in simple_{,un}lock()'s.
|
1.72 |
| 09-Mar-2002 |
thorpej | Remove pool_prime(). Nothing uses it, and how it should be used it not really well-defined in the absense of PR_STATIC.
|
1.71 |
| 09-Mar-2002 |
thorpej | If, when a page becomes idle, the backend allocator is waiting for resources, release the page immediately, rather than letting it sit around cached.
From art@openbsd.org.
|
1.70 |
| 09-Mar-2002 |
thorpej | Remove PR_MALLOCOK and PR_STATIC. The former wasn't actually used, and the latter, while there was some code tested the bit, was woefully incomplete and also unused by anything. Besides, PR_STATIC functionality could be better handled by backend allocators anyhow.
From art@openbsd.org
|
1.69 |
| 08-Mar-2002 |
thorpej | Add a missing simple_unlock.
|
1.68 |
| 08-Mar-2002 |
thorpej | Add an optional "drain" client callback, which can be set by the new pool_set_drain_hook(). This hook is called in three cases: * When a pool has hit the hard limit, just before either erroring out or sleeping. * When a backend allocator fails to allocate memory. * Just before trying to reclaim pages in pool_reclaim().
This hook requests the client to try and free some items back to the pool.
From art@openbsd.org.
|
1.67 |
| 08-Mar-2002 |
thorpej | Remove PR_FREEHEADER; nothing uses it anymore.
From art@openbsd.org.
|
1.66 |
| 08-Mar-2002 |
thorpej | Pool deals fairly well with physical memory shortage, but it doesn't deal with shortages of the VM maps where the backing pages are mapped (usually kmem_map). Try to deal with this:
* Group all information about the backend allocator for a pool in a separate structure. The pool references this structure, rather than the individual fields. * Change the pool_init() API accordingly, and adjust all callers. * Link all pools using the same backend allocator on a list. * The backend allocator is responsible for waiting for physical memory to become available, but will still fail if it cannot callocate KVA space for the pages. If this happens, carefully drain all pools using the same backend allocator, so that some KVA space can be freed. * Change pool_reclaim() to indicate if it actually succeeded in freeing some pages, and use that information to make draining easier and more efficient. * Get rid of PR_URGENT. There was only one use of it, and it could be dealt with by the caller.
From art@openbsd.org.
|
1.65 |
| 20-Nov-2001 |
enami | Call pr_log(PRLOG_GET) when POOL_DIAGNOSTIC is defined instead of DIAGNOSTIC for consistency.
|
1.64 |
| 12-Nov-2001 |
lukem | add RCSIDs
|
1.63 |
| 21-Oct-2001 |
chs | branches: 1.63.2; in pool_drain(), call pool_reclaim() while we still have interrupts blocked since the pool in question might be one used in interrupt context.
|
1.62 |
| 07-Oct-2001 |
bjh21 | Add support for allocating pool memory in units smaller than a whole page. This is activated by defining POOL_SUBPAGE to the size of the new allocation unit, and makes pools much more efficient on machines with obscenely large pages. It might even make four-megabyte arm26 systems usable.
|
1.61 |
| 26-Sep-2001 |
chs | jump through hoops to avoid calling uvm_km_free_poolpage() while holding spinlocks, since that function can sleep. (note that there's still one instance remaining to be fixed.) use TAILQ_FOREACH where appropriate.
|
1.60 |
| 01-Jul-2001 |
thorpej | branches: 1.60.2; 1.60.4; Protect the `pool cache group' pool with splvm(), so that pool caches can be used by code that runs in interrupt context.
|
1.59 |
| 05-Jun-2001 |
thorpej | Do the reentrancy checking if POOL_DIAGNOSTIC, not DIAGNOSTIC. Prevents ABI change for diagnostic vs. non-diagnostic kernels.
|
1.58 |
| 05-Jun-2001 |
thorpej | Assert that no locks are held if we're called with PR_WAITOK. From Bill Sommerfeld.
|
1.57 |
| 13-May-2001 |
sommerfeld | Make this build again ifdef DIAGNOSTIC (oops)
|
1.56 |
| 13-May-2001 |
sommerfeld | Remove pool reentrancy testing overhead unless DIAGNOSTIC is defined. Previously, we passed __FILE__ and __LINE__ on all pool_get/pool_set calls.
This change results in a measured 1.2% performance improvement in ping-flood packets-per-second as reported by ping(8).
|
1.55 |
| 10-May-2001 |
thorpej | Rearrange the code that adds pages of objects to the pool; require that the caller allocate the pool_item_header when it allocates the pool page, so we can avoid a locking pitfall (sleeping with a simple lock held).
Also revive pool_prime(), as there are some letigimate uses of it, but in doing so, eliminate some of the bogosities of the old version (i.e. don't do an implicit "setlowat", just prime the pool, and incr the minpages for each additional page we add, and compute the number of pages to prime in a way that callers would expect).
|
1.54 |
| 10-May-2001 |
thorpej | Use POOL_NEEDS_CATCHUP() in one more place.
|
1.53 |
| 10-May-2001 |
thorpej | Encapsulate the test for a pool needing a pool_catchup() in a macro.
|
1.52 |
| 09-May-2001 |
thorpej | Remove pool_create() and pool_prime(). Nothing except pool_create() used pool_prime(), and no one uses pool_create() anymore.
This makes it easier to fix a locking pitfall.
|
1.51 |
| 04-May-2001 |
thorpej | Add pool_cache_destruct_object(), used to force destruction of an object and release back into the pool.
|
1.50 |
| 29-Jan-2001 |
enami | branches: 1.50.2; Don't use PR_URGENT to allocate page header. We don't want to just panic on memory shortage. Instead, use the same wait/nowait condition with the item requested, and just cleanup and return failure if we can't allocate page header while we aren't allowed to wait.
|
1.49 |
| 14-Jan-2001 |
thorpej | Change some low-hanging splimp() calls to splvm().
|
1.48 |
| 11-Dec-2000 |
thorpej | Add some basic statistics to pool_cache.
|
1.47 |
| 10-Dec-2000 |
thorpej | Don't hold a pool cache lock across any call to pool_get() or pool_put(). This allows us to change a try-lock into a normal lock in the reclaim case.
|
1.46 |
| 07-Dec-2000 |
thorpej | ...and when freeing cache groups, clear `freeto' if that's the one we're freeing.
|
1.45 |
| 07-Dec-2000 |
thorpej | When we invalidate a pool cache, make sure to clear `allocfrom' if we empty out that cache group.
|
1.44 |
| 07-Dec-2000 |
thorpej | Add a /c modifier to "show pool" to display pool caches.
|
1.43 |
| 07-Dec-2000 |
thorpej | This is a first-cut implementation of support for caching of constructed objects in the pool allocator, similar to caching of constructed objects in the Solaris SLAB allocator.
This implementation is a separate API (pool_cache_*()) layered on top of pools to keep the caching complexity out of the way of pools that won't benefit from it.
While we're here, allow pool items to be as large as the pool page size.
|
1.42 |
| 06-Dec-2000 |
thorpej | ANSI'ify.
|
1.41 |
| 19-Nov-2000 |
sommerfeld | In pool_setlowat(), only call pool_catchup() if the pool is under the low water mark. (Avoids annoying warning when you setlowat a static pool).
|
1.40 |
| 12-Aug-2000 |
sommerfeld | Use ltsleep instead of simple_unlock/tsleep/simple_lock
|
1.39 |
| 27-Jun-2000 |
mrg | remove include of <vm/vm.h>
|
1.38 |
| 26-Jun-2000 |
mrg | remove/move more mach vm header files:
<vm/pglist.h> -> <uvm/uvm_pglist.h> <vm/vm_inherit.h> -> <uvm/uvm_inherit.h> <vm/vm_kern.h> -> into <uvm/uvm_extern.h> <vm/vm_object.h> -> nothing <vm/vm_pager.h> -> into <uvm/uvm_pager.h>
also includes a bunch of <vm/vm_page.h> include removals (due to redudancy with <vm/vm.h>), and a scattering of other similar headers.
|
1.37 |
| 10-Jun-2000 |
sommerfeld | Fix assorted bugs around shutdown/reboot/panic time. - add a new global variable, doing_shutdown, which is nonzero if vfs_shutdown() or panic() have been called. - in panic, set RB_NOSYNC if doing_shutdown is already set on entry so we don't reenter vfs_shutdown if we panic'ed there. - in vfs_shutdown, don't use proc0's process for sys_sync unless curproc is NULL. - in lockmgr, attribute successful locks to proc0 if doing_shutdown && curproc==NULL, and panic if we can't get the lock right away; avoids the spurious lockmgr DIAGNOSTIC panic from the ddb reboot command. - in subr_pool, deal with curproc==NULL in the doing_shutdown case. - in mfs_strategy, bitbucket writes if doing_shutdown, so we don't wedge waiting for the mfs process. - in ltsleep, treat ((curproc == NULL) && doing_shutdown) like the panicstr case.
Appears to fix: kern/9239, kern/10187, kern/9367. May also fix kern/10122.
|
1.36 |
| 31-May-2000 |
pk | Allow a pool's pagesz to larger than the VM page size. Enforce the required page alignment restriction in pool_prime_page().
|
1.35 |
| 31-May-2000 |
pk | Assert that the pool item size does not exceed the page size.
|
1.34 |
| 08-May-2000 |
thorpej | branches: 1.34.2; __predict_false() the DIAGNOSTIC and other error condition checks.
|
1.33 |
| 13-Apr-2000 |
chs | always define PI_MAGIC so this compiles in all cases.
|
1.32 |
| 10-Apr-2000 |
chs | in pool_put(), fill the entire object with PI_MAGIC instead of just the first element.
|
1.31 |
| 14-Feb-2000 |
thorpej | Use ratecheck().
|
1.30 |
| 29-Aug-1999 |
thorpej | branches: 1.30.2; In _pool_put(), panic if we're put'ing with nout == 0. This will help us detect a little earlier if we've dup-put'd. Otherwise, underflow occurs, and subsequent allocations simply hang or fail (it thinks the hardlimit has been reached).
|
1.29 |
| 05-Aug-1999 |
sommerfeld | Create new pool flag PR_LIMITFAIL, indicating that even PR_WAIT allocations should fail if the pool is at its hard limit. Document flag in pool(9). Use it in mbuf.h for the first allocate call for M_GET, M_GETHDR, and MCLGET, so that m_reclaim gets called even for blocking allocations.
|
1.28 |
| 27-Jul-1999 |
thorpej | In _pool_put(), call simple_lock_freecheck() if we're LOCKDEBUG before we put the item on the free list.
|
1.27 |
| 06-Jun-1999 |
pk | Guard our global resource `phpool' against all interrupts.
|
1.26 |
| 10-May-1999 |
thorpej | Make sure page allocations are counted everywhere that they need to be.
|
1.25 |
| 10-May-1999 |
thorpej | Improve the pool allocator's diagnostic helpers, adding the ability to log on a per-pool basis, reentrancy checking, and dumping various pool information from DDB.
|
1.24 |
| 29-Apr-1999 |
scottr | Pull in opt_poollog.h for POOL_LOGSIZE.
|
1.23 |
| 06-Apr-1999 |
thorpej | More locking protocol fixes. Protect pool_head with a spin lock (statically initialized). This lock also protects the "next drain candidate" pointer.
XXX There is still one locking protocol problem, which should not be a problem in practice, but is still marked as an issue in the code anyhow.
|
1.22 |
| 04-Apr-1999 |
chs | Undo the part of the last revision about pr_rmpage() referencing a data structure after it was freed. This wasn't actually a problem, and the change caused the wrong pool_item_header to be freed in the non-PR_PHINPAGE case.
|
1.21 |
| 31-Mar-1999 |
thorpej | branches: 1.21.2; Yet more fixes to the pool allocator:
- Protect userspace from unnecessary header inclusions (as noted on current-users).
- Some const poisioning.
- GREATLY simplify the locking protocol, and fix potential deadlock scenarios. In particular, assume that the back-end page allocator provides its own locking mechanism (this is currently true for all such allocators in the NetBSD kernel). Doing so allows us to simply use one spin lock for serialized access to all r/w members of the pool descriptor. The spin lock is released before calling the back-end allocator, and re-acquired upon return from it.
- Fix a problem in pr_rmpage() where a data structure was referenced after it was freed.
- Minor tweak to page manaement. Migrate both idle and empty pages to the end of the page list. As soon as a page becomes un-empty (by a pool_put()), place it at the head of the page list, and set curpage to point to it. This reduces fragmentation as well as the time required to find a non-empty page as soon as curpage becomes empty again.
- Use mono_time throughout, and protect access to it w/ splclock().
- In pool_reclaim(), if freeing an idle page would reduce the number of allocatable items to below the low water mark, don't.
|
1.20 |
| 31-Mar-1999 |
thorpej | Fix several bugs/deficiencies in the pool allocator:
- Add support for hard limits, with optional rate-limited logging of a warning message when the pool limit is reached. (This will be used to fix a bug in mbuf cluster allocation on the MIPS and Alpha ports.)
- Fix some locking protocol errors. This required splitting pr_flags into pr_flags (which is protected by the spin lock) and pr_roflags (which are `read only' flags, set when the pool is initialized, and never changed again; these do not need to be protected by a mutex).
- Make the low water support actually mean something. When a low water mark is set, add free items to the pool until the low water mark is reached. When an item allocation causes the number of free items to drop below the low water mark, make the pool catch up to it. This can make the pool allocator more useful for several applications (e.g. pmap `pv entry' management) and more robust for others (for e.g. mbuf and mbuf cluster allocation, so that the pagedaemon can use NFS to clean pages on diskless systems without completely running dry on buffers to receive packets in during extreme memory shoratages).
- Add a comment where we sleep waiting for more pages for the back-end page allocator. Specifically, instead of sleeping potentially forever, perhaps we should just wake up once a second to try allocating a page again. XXX Revisit this soon.
|
1.19 |
| 24-Mar-1999 |
mrg | completely remove Mach VM support. all that is left is the all the header files as UVM still uses (most of) these.
|
1.18 |
| 23-Mar-1999 |
thorpej | Fix the order of arguments to roundup().
|
1.17 |
| 27-Dec-1998 |
thorpej | Make this compile with POOL_DIAGNOSTIC, and add a POOL_LOGSIZE option. Defopt these.
|
1.16 |
| 16-Dec-1998 |
briggs | Prototype pool_print() and pool_chk() if DEBUG. Initialize pool hash table with PR_HASHTABSIZE (i.e., 8) LIST_INIT()s instead of one memset(). Only check for page != ph->ph_page if PR_PHINPAGE is set (in pool_chk()). Print pool base pointer when reporting page inconsistency in pool_chk().
|
1.15 |
| 29-Sep-1998 |
pk | In addition to the spinlock, use the lockmgr() to serialize access to the back-end page allocator. This allows the back-end to sleep since we now relinquish the spin lock after acquiring the long-term lock.
|
1.14 |
| 22-Sep-1998 |
thorpej | Make sure the size is large enough to hold a pool_item.
|
1.13 |
| 12-Sep-1998 |
christos | Make copyrights consistent; fix weird/trailing spaces add missing (c) etc.
|
1.12 |
| 28-Aug-1998 |
thorpej | Add an alternate pool page allocator that can be used if the pool is never accessed in interrupt context. In the UVM case, this uses the kernel_map, to reduce usage of the previous kmem_map resource.
|
1.11 |
| 28-Aug-1998 |
thorpej | Add a waitok boolean argument to the VM system's pool page allocator backend.
|
1.10 |
| 13-Aug-1998 |
eeh | Merge paddr_t changes into the main branch.
|
1.9 |
| 04-Aug-1998 |
perry | Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one. bcopy(x, y, z) -> memcpy(y, x, z) ovbcopy(x, y, z) -> memmove(y, x, z) bcmp(x, y, z) -> memcmp(x, y, z) bzero(x, y) -> memset(x, 0, y)
|
1.8 |
| 02-Aug-1998 |
thorpej | Make sure we initialize pr_nidle.
|
1.7 |
| 02-Aug-1998 |
thorpej | Fix a braino in the idle page instrumentation.
|
1.6 |
| 01-Aug-1998 |
thorpej | Instrument "idle pages" (i.e. pages which have no items allocated from them, and could thus be freed back to the system).
|
1.5 |
| 31-Jul-1998 |
thorpej | Un-static pool_head; vmstat wants to find it.
|
1.4 |
| 24-Jul-1998 |
thorpej | branches: 1.4.2; A few small changes to how pool pages are allocated/freed: - If either an alloc or release function is provided, make sure both are provided, otherwise panic, as this is a fatal error. - If using the default allocator, default the pool pagesz to PAGE_SIZE, since that is the granularity of the default allocator's mechanism. - In the default allocator, use new functions: uvm_km_alloc_poolpage()/uvm_km_free_poolpage(), or kmem_alloc_poolpage()/kmem_free_poolpage() rather than doing it here. These functions may use pmap hooks to provide alternate methods of mapping pool pages.
|
1.3 |
| 23-Jul-1998 |
pk | Re-vamped pool manager. * support for customized memory supplier * automatic page reclaim by VM system * time-based hysteresis * cache coloring (after Bonwick's "slabs")
|
1.2 |
| 19-Feb-1998 |
pk | Add option to use "static" storage provided by the caller. From Matthias Drochner.
|
1.1 |
| 15-Dec-1997 |
pk | Memory pool resource utility.
|
1.4.2.2 |
| 08-Aug-1998 |
eeh | Revert cdevsw mmap routines to return int.
|
1.4.2.1 |
| 30-Jul-1998 |
eeh | Split vm_offset_t and vm_size_t into paddr_t, psize_t, vaddr_t, and vsize_t.
|
1.21.2.4 |
| 25-Jun-1999 |
perry | somehow, the last commit was botched. fix it
|
1.21.2.3 |
| 24-Jun-1999 |
perry | pullup 1.26->1.27 (pk): deal with missing "raise interrupt level" code
|
1.21.2.2 |
| 07-Apr-1999 |
thorpej | branches: 1.21.2.2.2; 1.21.2.2.4; Pull up 1.22 -> 1.23.
|
1.21.2.1 |
| 04-Apr-1999 |
chs | pull up rev 1.22. approved by perry.
|
1.21.2.2.4.1 |
| 30-Nov-1999 |
itojun | bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch just for reference purposes. This commit includes 1.4 -> 1.4.1 sync for kame branch.
The branch does not compile at all (due to the lack of ALTQ and some other source code). Please do not try to modify the branch, this is just for referenre purposes.
synchronization to latest KAME will take place on HEAD branch soon.
|
1.21.2.2.2.3 |
| 02-Aug-1999 |
thorpej | Update from trunk.
|
1.21.2.2.2.2 |
| 04-Jul-1999 |
chs | in pool_put(), fill the item with a distinctive pattern ifdef DEBUG.
|
1.21.2.2.2.1 |
| 21-Jun-1999 |
thorpej | Sync w/ -current.
|
1.30.2.6 |
| 11-Feb-2001 |
bouyer | Sync with HEAD.
|
1.30.2.5 |
| 18-Jan-2001 |
bouyer | Sync with head (for UBC+NFS fixes, mostly).
|
1.30.2.4 |
| 13-Dec-2000 |
bouyer | Sync with HEAD (for UBC fixes).
|
1.30.2.3 |
| 08-Dec-2000 |
bouyer | Sync with HEAD.
|
1.30.2.2 |
| 22-Nov-2000 |
bouyer | Sync with HEAD.
|
1.30.2.1 |
| 20-Nov-2000 |
bouyer | Update thorpej_scsipi to -current as of a month ago
|
1.34.2.1 |
| 22-Jun-2000 |
minoura | Sync w/ netbsd-1-5-base.
|
1.50.2.13 |
| 11-Dec-2002 |
thorpej | Sync with HEAD.
|
1.50.2.12 |
| 11-Nov-2002 |
nathanw | Catch up to -current
|
1.50.2.11 |
| 18-Oct-2002 |
nathanw | Catch up to -current.
|
1.50.2.10 |
| 27-Aug-2002 |
nathanw | Catch up to -current.
|
1.50.2.9 |
| 01-Aug-2002 |
nathanw | Catch up to -current.
|
1.50.2.8 |
| 24-Jun-2002 |
nathanw | Curproc->curlwp renaming.
Change uses of "curproc->l_proc" back to "curproc", which is more like the original use. Bare uses of "curproc" are now "curlwp".
"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL) so that it is always safe to reference curproc (*de*referencing curproc is another story, but that's always been true).
|
1.50.2.7 |
| 01-Apr-2002 |
nathanw | Catch up to -current. (CVS: It's not just a program. It's an adventure!)
|
1.50.2.6 |
| 08-Jan-2002 |
nathanw | Catch up to -current.
|
1.50.2.5 |
| 14-Nov-2001 |
nathanw | Catch up to -current.
|
1.50.2.4 |
| 22-Oct-2001 |
nathanw | Catch up to -current.
|
1.50.2.3 |
| 26-Sep-2001 |
nathanw | Catch up to -current. Again.
|
1.50.2.2 |
| 24-Aug-2001 |
nathanw | Catch up with -current.
|
1.50.2.1 |
| 21-Jun-2001 |
nathanw | Catch up to -current.
|
1.60.4.2 |
| 11-Oct-2001 |
fvdl | Catch up with -current. Fix some bogons in the sparc64 kbd/ms attach code. cd18xx conversion provided by mrg.
|
1.60.4.1 |
| 01-Oct-2001 |
fvdl | Catch up with -current.
|
1.60.2.4 |
| 10-Oct-2002 |
jdolecek | sync kqueue with -current; this includes merge of gehenna-devsw branch, merge of i386 MP branch, and part of autoconf rototil work
|
1.60.2.3 |
| 06-Sep-2002 |
jdolecek | sync kqueue branch with HEAD
|
1.60.2.2 |
| 16-Mar-2002 |
jdolecek | Catch up with -current.
|
1.60.2.1 |
| 10-Jan-2002 |
thorpej | Sync kqueue branch with -current.
|
1.63.2.1 |
| 12-Nov-2001 |
thorpej | Sync the thorpej-mips-cache branch with -current.
|
1.74.2.2 |
| 12-Mar-2002 |
thorpej | Do the previous differently; instead, pad the size the the structure to the specified alignment, the way we pad to the system's natural alignment.
|
1.74.2.1 |
| 12-Mar-2002 |
thorpej | Sprinkle some assertions around that ensures that the returned object is aligned as requested.
Bug fix: in pool_prime_page(), make sure to account for alignment when advancing the pointer through the page.
|
1.76.6.1 |
| 11-Nov-2002 |
he | Pull up revision 1.78 (requested by thorpej in ticket #582): Bring down a fix from the "newlock" branch, slightly modified: o In pool_prime_page(), assert that the object being placed onto the free list meets the alignment constraints (that "ioff" within the object is aligned to "align"). o In pool_init(), round up the object size to the alignment value (or ALIGN(1), if no special alignment is needed) so that the above invariant holds true.
|
1.76.4.2 |
| 29-Aug-2002 |
gehenna | catch up with -current.
|
1.76.4.1 |
| 15-Jul-2002 |
gehenna | catch up with -current.
|
1.87.2.7 |
| 11-Dec-2005 |
christos | Sync with head.
|
1.87.2.6 |
| 10-Nov-2005 |
skrll | Sync with HEAD. Here we go again...
|
1.87.2.5 |
| 01-Apr-2005 |
skrll | Sync with HEAD.
|
1.87.2.4 |
| 17-Jan-2005 |
skrll | Sync with HEAD.
|
1.87.2.3 |
| 21-Sep-2004 |
skrll | Fix the sync with head I botched.
|
1.87.2.2 |
| 18-Sep-2004 |
skrll | Sync with HEAD.
|
1.87.2.1 |
| 03-Aug-2004 |
skrll | Sync with HEAD
|
1.93.2.1 |
| 22-Jun-2004 |
tron | Pull up revision 1.96 (requested by thorpej in ticket #522): Remove PR_IMMEDRELEASE, since setting the high water mark will achieve the same thing. Pointed out back in January by YAMAMOTO Takashi.
|
1.99.8.2 |
| 10-Mar-2006 |
tron | Pull up following revision(s) (requested by bjh21 in ticket #1192): sys/sys/pool.h: revision 1.48 sys/kern/subr_pool.c: revision 1.112 Medium-sized overhaul of POOL_SUBPAGE support so that: 1: I can understand it, and 2: It works. Notable externally-visible changes are that POOL_SUBPAGE now has to be a compile-time constant, and that trying to initialise a pool whose objects are larger than POOL_SUBPAGE automatically generates a pool that doesn't use subpages. NetBSD/acorn26 now boots multi-user again.
|
1.99.8.1 |
| 18-Jun-2005 |
tron | branches: 1.99.8.1.2; Pull up revision 1.101 (requested by thorpej in ticket #474): Fix some locking issues: - Make the locking rules for pr_rmpage() sane, and don't modify fields protected by the pool lock without actually holding it. - Always defer freeing the pool page to the back-end allocator, to avoid invoking the pool_allocator with the pool locked (which would violate the pool_allocator -> pool locking order). - Fix pool_reclaim() to not violate the pool_cache -> pool locking order by using a trylock. Reviewed by Chuq Silvers.
|
1.99.8.1.2.1 |
| 10-Mar-2006 |
tron | Pull up following revision(s) (requested by bjh21 in ticket #1192): sys/sys/pool.h: revision 1.48 sys/kern/subr_pool.c: revision 1.112 Medium-sized overhaul of POOL_SUBPAGE support so that: 1: I can understand it, and 2: It works. Notable externally-visible changes are that POOL_SUBPAGE now has to be a compile-time constant, and that trying to initialise a pool whose objects are larger than POOL_SUBPAGE automatically generates a pool that doesn't use subpages. NetBSD/acorn26 now boots multi-user again.
|
1.99.4.1 |
| 25-Jan-2005 |
yamt | convert to new apis.
|
1.99.2.1 |
| 29-Apr-2005 |
kent | sync with -current
|
1.101.2.13 |
| 24-Mar-2008 |
yamt | sync with head.
|
1.101.2.12 |
| 17-Mar-2008 |
yamt | sync with head.
|
1.101.2.11 |
| 27-Feb-2008 |
yamt | sync with head.
|
1.101.2.10 |
| 11-Feb-2008 |
yamt | sync with head.
|
1.101.2.9 |
| 04-Feb-2008 |
yamt | sync with head.
|
1.101.2.8 |
| 21-Jan-2008 |
yamt | sync with head
|
1.101.2.7 |
| 07-Dec-2007 |
yamt | sync with head
|
1.101.2.6 |
| 15-Nov-2007 |
yamt | sync with head.
|
1.101.2.5 |
| 27-Oct-2007 |
yamt | sync with head.
|
1.101.2.4 |
| 03-Sep-2007 |
yamt | sync with head.
|
1.101.2.3 |
| 26-Feb-2007 |
yamt | sync with head.
|
1.101.2.2 |
| 30-Dec-2006 |
yamt | sync with head.
|
1.101.2.1 |
| 21-Jun-2006 |
yamt | sync with head.
|
1.110.2.2 |
| 01-Mar-2006 |
yamt | sync with head.
|
1.110.2.1 |
| 01-Feb-2006 |
yamt | sync with head.
|
1.111.4.3 |
| 01-Jun-2006 |
kardel | Sync with head.
|
1.111.4.2 |
| 22-Apr-2006 |
simonb | Sync with head.
|
1.111.4.1 |
| 04-Feb-2006 |
simonb | Adapt for timecounters: mostly use get*time() and use "time_second" instead of "time.tv_sec".
|
1.111.2.1 |
| 09-Sep-2006 |
rpaulo | sync with head
|
1.112.6.2 |
| 24-May-2006 |
tron | Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
|
1.112.6.1 |
| 28-Mar-2006 |
tron | Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
|
1.112.4.1 |
| 19-Apr-2006 |
elad | sync with head.
|
1.112.2.6 |
| 03-Sep-2006 |
yamt | sync with head.
|
1.112.2.5 |
| 11-Aug-2006 |
yamt | sync with head
|
1.112.2.4 |
| 26-Jun-2006 |
yamt | sync with head.
|
1.112.2.3 |
| 24-May-2006 |
yamt | sync with head.
|
1.112.2.2 |
| 11-Apr-2006 |
yamt | sync with head
|
1.112.2.1 |
| 01-Apr-2006 |
yamt | sync with head.
|
1.116.2.1 |
| 19-Jun-2006 |
chap | Sync with head.
|
1.122.4.2 |
| 10-Dec-2006 |
yamt | sync with head.
|
1.122.4.1 |
| 22-Oct-2006 |
yamt | sync with head
|
1.122.2.3 |
| 19-Jan-2007 |
ad | Add some DEBUG code to check that items being freed were previously allocated from the same source. Needs to be enabled via DDB.
|
1.122.2.2 |
| 20-Oct-2006 |
ad | Remove sched_lock assertion.
|
1.122.2.1 |
| 11-Sep-2006 |
ad | From the newlock branch: add some KASSERT() verifying correct alignment.
|
1.125.2.3 |
| 24-Mar-2007 |
yamt | sync with head.
|
1.125.2.2 |
| 12-Mar-2007 |
rmind | Sync with HEAD.
|
1.125.2.1 |
| 27-Feb-2007 |
yamt | - sync with head. - move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
|
1.128.2.13 |
| 01-Nov-2007 |
ad | pool_reclaim: acquire kernel_lock if the pool is at IPL_SOFTCLOCK, SOFTNET or SOFTSERIAL, as mutexes at these levels must still be spinlocks. It's not yet safe for e.g. ip_intr() to block as this upsets code calling up from the socket layer. It can find pcbs sitting half baked.
pool_cache_xcall: go to splvm to prevent kernel_lock from being taken, for the reason listed above.
Pointed out by yamt@.
|
1.128.2.12 |
| 29-Oct-2007 |
ad | pool_drain_start: tweak assertions/comments.
|
1.128.2.11 |
| 26-Oct-2007 |
ad | - Use a cross call to drain the per-CPU component of pool caches. - When draining, skip over pools that are completly inactive.
|
1.128.2.10 |
| 25-Sep-2007 |
ad | If no constructor/destructor are provided for a pool_cache, use nullop. Remove the tests for pc_ctor/pc_dtor != NULL.
|
1.128.2.9 |
| 10-Sep-2007 |
ad | Fix a deadlock.
|
1.128.2.8 |
| 09-Sep-2007 |
ad | - Re-enable pool_cache, since it works on i386 again after today's pmap change. pool_cache_invalidate() no longer invalidates objects stored in the per-CPU caches. This needs some thought. - Remove pcg_get, pcg_put since they are only called from one place each. - Remove cc_busy assertions, since they don't work correctly. Pointed out by yamt@. - Add some more-assertions and simplify.
|
1.128.2.7 |
| 01-Sep-2007 |
ad | - Add a CPU layer to pool caches. In combination with vmem/kmem this provides CPU-local slab/object and general purpose allocators. The strategy used is as described in Jeff Bonwick's USENIX paper, except in at least one place where the described allocation strategy doesn't make sense. For exclusive access to the CPU layer the IPL is raised or kernel preemption disabled. Where the interrupt priority levels are software emulated this is much cheaper than taking a lock, and I think that writing to a local %pil register is likely to have a similar penalty to taking a lock.
No tuning of the group sizes is currently done - all groups have 15 items each, but this should be fairly easy to implement. Also, the reclamation mechanism should probably use a cross-call to drain the CPU-level caches on remote CPUs.
Currently this causes kernel memory corruption on i386, yet works without a problem on amd64. The cache layer is disabled for the time being until I can find the bug.
- Change the pool_cache API so that the caches are themselves dynamically allocated, and that each cache is tied to a single pool only. Add some stubs to change pool_cache parameters that call directly through to the pool layer (e.g. pool_cache_sethiwat). The idea here is that pool_cache should become the default object allocator (and so LKM friendly), and that the pool allocator should be for kernel-internal use only. This will be posted to tech-kern@ for review.
|
1.128.2.6 |
| 20-Aug-2007 |
ad | Sync with HEAD.
|
1.128.2.5 |
| 29-Jul-2007 |
ad | Trap free() of areas that contain undestroyed locks. Not a major problem but it helps to catch bugs.
|
1.128.2.4 |
| 22-Mar-2007 |
ad | - Remove debugging crud. - wakeup -> cv_broadcast.
|
1.128.2.3 |
| 21-Mar-2007 |
ad | GC the simplelock/spinlock debugging stuff.
|
1.128.2.2 |
| 13-Mar-2007 |
ad | Pull in the initial set of changes for the vmlocking branch.
|
1.128.2.1 |
| 13-Mar-2007 |
ad | Sync with head.
|
1.129.12.6 |
| 09-Dec-2007 |
jmcneill | Sync with HEAD.
|
1.129.12.5 |
| 21-Nov-2007 |
joerg | Sync with HEAD.
|
1.129.12.4 |
| 14-Nov-2007 |
joerg | Sync with HEAD.
|
1.129.12.3 |
| 11-Nov-2007 |
joerg | Sync with HEAD.
|
1.129.12.2 |
| 26-Oct-2007 |
joerg | Sync with HEAD.
Follow the merge of pmap.c on i386 and amd64 and move pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup code to restore CR4 before jumping back into kernel space as the large page option might cover that.
|
1.129.12.1 |
| 03-Sep-2007 |
jmcneill | Sync with HEAD.
|
1.129.8.1 |
| 03-Sep-2007 |
skrll | Sync with HEAD.
|
1.131.4.1 |
| 14-Oct-2007 |
yamt | sync with head.
|
1.131.2.4 |
| 23-Mar-2008 |
matt | sync with HEAD
|
1.131.2.3 |
| 09-Jan-2008 |
matt | sync with HEAD
|
1.131.2.2 |
| 08-Nov-2007 |
matt | sync with -HEAD
|
1.131.2.1 |
| 06-Nov-2007 |
matt | sync with HEAD
|
1.133.4.4 |
| 18-Feb-2008 |
mjf | Sync with HEAD.
|
1.133.4.3 |
| 27-Dec-2007 |
mjf | Sync with HEAD.
|
1.133.4.2 |
| 08-Dec-2007 |
mjf | Sync with HEAD.
|
1.133.4.1 |
| 19-Nov-2007 |
mjf | Sync with HEAD.
|
1.133.2.2 |
| 18-Nov-2007 |
bouyer | Sync with HEAD
|
1.133.2.1 |
| 13-Nov-2007 |
bouyer | Sync with HEAD
|
1.137.2.7 |
| 31-Dec-2007 |
ad | Make pool_cache_disable work again.
|
1.137.2.6 |
| 28-Dec-2007 |
ad | pool_cache_put_slow: fill cc_previous if empty. Pointed out by yamt@.
|
1.137.2.5 |
| 26-Dec-2007 |
ad | Sync with head.
|
1.137.2.4 |
| 26-Dec-2007 |
ad | Need sys/atomic.h here.
|
1.137.2.3 |
| 15-Dec-2007 |
ad | Sort list of pools/caches to make easier them easier to find.
|
1.137.2.2 |
| 12-Dec-2007 |
ad | Add a global 'pool_cache_disable', to be set from the debugger. Helpful when tracking down leaks.
|
1.137.2.1 |
| 08-Dec-2007 |
ad | Sync with head.
|
1.138.4.3 |
| 08-Jan-2008 |
bouyer | Sync with HEAD
|
1.138.4.2 |
| 02-Jan-2008 |
bouyer | Sync with HEAD
|
1.138.4.1 |
| 13-Dec-2007 |
bouyer | Sync with HEAD
|
1.138.2.3 |
| 13-Dec-2007 |
yamt | sync with head.
|
1.138.2.2 |
| 10-Dec-2007 |
yamt | - separate kernel va allocation (kernel_va_arena) from in-kernel fault handling (kernel_map). - add vmem bootstrap code. vmem doesn't rely on malloc anymore. - make kmem_alloc interrupt-safe. - kill kmem_map. make malloc a wrapper of kmem_alloc.
|
1.138.2.1 |
| 10-Dec-2007 |
yamt | add pool_cache_bootstrap_destroy. will be used by vmem.
|
1.151.6.4 |
| 17-Jan-2009 |
mjf | Sync with HEAD.
|
1.151.6.3 |
| 28-Sep-2008 |
mjf | Sync with HEAD.
|
1.151.6.2 |
| 02-Jun-2008 |
mjf | Sync with HEAD.
|
1.151.6.1 |
| 03-Apr-2008 |
mjf | Sync with HEAD.
|
1.151.2.1 |
| 24-Mar-2008 |
keiichi | sync with head.
|
1.156.2.2 |
| 04-Jun-2008 |
yamt | sync with head
|
1.156.2.1 |
| 18-May-2008 |
yamt | sync with head.
|
1.158.2.5 |
| 11-Aug-2010 |
yamt | sync with head.
|
1.158.2.4 |
| 11-Mar-2010 |
yamt | sync with head
|
1.158.2.3 |
| 16-Sep-2009 |
yamt | sync with head
|
1.158.2.2 |
| 04-May-2009 |
yamt | sync with head.
|
1.158.2.1 |
| 16-May-2008 |
yamt | sync with head.
|
1.160.2.2 |
| 18-Sep-2008 |
wrstuden | Sync with wrstuden-revivesa-base-2.
|
1.160.2.1 |
| 23-Jun-2008 |
wrstuden | Sync w/ -current. 34 merge conflicts to follow.
|
1.161.2.1 |
| 18-Jul-2008 |
simonb | Sync with head.
|
1.165.2.3 |
| 13-Dec-2008 |
haad | Update haad-dm branch to haad-dm-base2.
|
1.165.2.2 |
| 19-Oct-2008 |
haad | Sync with HEAD.
|
1.165.2.1 |
| 07-Jul-2008 |
haad | file subr_pool.c was added on branch haad-dm on 2008-10-19 22:17:28 +0000
|
1.170.4.1 |
| 17-Nov-2008 |
snj | Pull up following revision(s) (requested by ad in ticket #72): sys/kern/subr_pool.c: revision 1.171 Avoid recursive mutex_enter() when the system is low on KVA. Should fix crash reported by riz on current-users.
|
1.170.2.2 |
| 28-Apr-2009 |
skrll | Sync with HEAD.
|
1.170.2.1 |
| 19-Jan-2009 |
skrll | Sync with HEAD.
|
1.171.4.1 |
| 13-May-2009 |
jym | Sync with HEAD.
Commit is split, to avoid a "too many arguments" protocol error.
|
1.182.4.4 |
| 21-Apr-2011 |
rmind | sync with head
|
1.182.4.3 |
| 05-Mar-2011 |
rmind | sync with head
|
1.182.4.2 |
| 03-Jul-2010 |
rmind | sync with head
|
1.182.4.1 |
| 30-May-2010 |
rmind | sync with head
|
1.182.2.2 |
| 17-Aug-2010 |
uebayasi | Sync with HEAD.
|
1.182.2.1 |
| 30-Apr-2010 |
uebayasi | Sync with HEAD.
|
1.186.2.1 |
| 06-Jun-2011 |
jruoho | Sync with HEAD.
|
1.190.6.2 |
| 02-Jun-2012 |
mrg | sync to latest -current.
|
1.190.6.1 |
| 18-Feb-2012 |
mrg | merge to -current.
|
1.190.2.4 |
| 22-May-2014 |
yamt | sync with head.
for a reference, the tree before this commit was tagged as yamt-pagecache-tag8.
this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
|
1.190.2.3 |
| 30-Oct-2012 |
yamt | sync with head
|
1.190.2.2 |
| 23-May-2012 |
yamt | sync with head.
|
1.190.2.1 |
| 17-Apr-2012 |
yamt | sync with head
|
1.194.2.2 |
| 21-May-2014 |
bouyer | Pull up following revision(s) (requested by abs in ticket #1054): sys/kern/subr_pool.c: revision 1.202 Ensure pool_head is non static - for "vmstat -i"
|
1.194.2.1 |
| 02-Jul-2012 |
jdc | Pull up revisions: src/sys/kern/subr_pool.c revision 1.196 src/share/man/man9/pool_cache.9 patch (requested by jym in ticket #366).
As pool reclaiming is unlikely to happen at interrupt or softint context, re-enable the portion of code that allows invalidation of CPU-bound pool caches.
Two reasons: - CPU cached objects being invalidated, the probability of fetching an obsolete object from the pool_cache(9) is greatly reduced. This speeds up pool_cache_get() quite a bit as it does not have to keep destroying objects until it finds an updated one when an invalidation is in progress.
- for situations where we have to ensure that no obsolete object remains after a state transition (canonical example: pmap mappings between Xen VM restoration), invalidating all pool_cache(9) is the safest way to go.
As it uses xcall(9) to broadcast the execution of pool_cache_transfer(), pool_cache_invalidate() cannot be called from interrupt or softint context (scheduling a xcall(9) can put a LWP to sleep).
pool_cache_xcall() => pool_cache_transfer() to reflect its use.
Invalidation being a costly process (1000s objects may be destroyed), all places where pool_cache_invalidate() may be called from interrupt/softint context will now get caught by the proper KASSERT(), and fixed. Ping me when you see one.
Tested under i386 and amd64 by running ATF suite within 64MiB HVM domains (tried triggering pgdaemon a few times).
No objection on tech-kern@.
XXX a similar fix has to be pulled up to NetBSD-6, but with a more conservative approach.
See http://mail-index.netbsd.org/tech-kern/2012/05/29/msg013245.html
|
1.198.2.4 |
| 03-Dec-2017 |
jdolecek | update from HEAD
|
1.198.2.3 |
| 20-Aug-2014 |
tls | Rebase to HEAD as of a few days ago.
|
1.198.2.2 |
| 23-Jun-2013 |
tls | resync from head
|
1.198.2.1 |
| 25-Feb-2013 |
tls | resync with head
|
1.200.6.1 |
| 18-May-2014 |
rmind | sync with head
|
1.201.2.1 |
| 10-Aug-2014 |
tls | Rebase.
|
1.203.4.3 |
| 28-Aug-2017 |
skrll | Sync with HEAD
|
1.203.4.2 |
| 19-Mar-2016 |
skrll | Sync with HEAD
|
1.203.4.1 |
| 22-Sep-2015 |
skrll | Sync with HEAD
|
1.203.2.1 |
| 06-Mar-2016 |
martin | Pull up following revision(s) (requested by knakahara in ticket #1103): sys/kern/subr_pool.c: revision 1.206 fix: "vmstat -C" CpuLayer showed only the last cpu values.
|
1.206.4.1 |
| 21-Apr-2017 |
bouyer | Sync with HEAD
|
1.206.2.1 |
| 20-Mar-2017 |
pgoyette | Sync with HEAD
|
1.207.6.1 |
| 27-Feb-2018 |
martin | Pull up following revision(s) (requested by mrg in ticket #593): sys/dev/marvell/mvxpsec.c: revision 1.2 sys/arch/m68k/m68k/pmap_motorola.c: revision 1.70 sys/opencrypto/crypto.c: revision 1.102 sys/arch/sparc64/sparc64/pmap.c: revision 1.308 sys/ufs/chfs/chfs_malloc.c: revision 1.5 sys/arch/powerpc/oea/pmap.c: revision 1.95 sys/sys/pool.h: revision 1.80,1.82 sys/kern/subr_pool.c: revision 1.209-1.216,1.219-1.220 sys/arch/alpha/alpha/pmap.c: revision 1.262 sys/kern/uipc_mbuf.c: revision 1.173 sys/uvm/uvm_fault.c: revision 1.202 sys/sys/mbuf.h: revision 1.172 sys/kern/subr_extent.c: revision 1.86 sys/arch/x86/x86/pmap.c: revision 1.266 (via patch) sys/dev/dtv/dtv_scatter.c: revision 1.4
Allow only one pending call to a pool's backing allocator at a time. Candidate fix for problems with hanging after kva fragmentation related to PR kern/45718.
Proposed on tech-kern: https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html Tested by bouyer@ on i386.
This makes one small change to the semantics of pool_prime and pool_setlowat: they may fail with EWOULDBLOCK instead of ENOMEM, if there is a pending call to the backing allocator in another thread but we are not actually out of memory. That is unlikely because nearly always these are used during initialization, when the pool is not in use.
Define the new flag too for previous commit.
pool_grow can now fail even when sleeping is ok. Catch this case in pool_get and retry.
Assert that pool_get failure happens only with PR_NOWAIT. This would have caught the mistake I made last week leading to null pointer dereferences all over the place, a mistake which I evidently poorly scheduled alongside maxv's change to the panic message on x86 for null pointer dereferences.
Since pr_lock is now used to wait for two things now (PR_GROWING and PR_WANTED) we need to loop for the condition we wanted. make the KASSERTMSG/panic strings consistent as '%s: [%s], __func__, wchan' Handle the ERESTART case from pool_grow()
don't pass 0 to the pool flags Guess pool_cache_get(pc, 0) means PR_WAITOK here. Earlier on in the same context we use kmem_alloc(sz, KM_SLEEP).
use PR_WAITOK everywhere. use PR_NOWAIT.
Don't use 0 for PR_NOWAIT
use PR_NOWAIT instead of 0
panic ex nihilo -- PR_NOWAITing for zerot
Add assertions that either PR_WAITOK or PR_NOWAIT are set. - fix an assert; we can reach there if we are nowait or limitfail. - when priming the pool and failing with ERESTART, don't decrement the number of pages; this avoids the issue of returning an ERESTART when we get to 0, and is more correct. - simplify the pool_grow code, and don't wakeup things if we ENOMEM.
In pmap_enter_ma(), only try to allocate pves if we might need them, and even if that fails, only fail the operation if we later discover that we really do need them. This implements the requirement that pmap_enter(PMAP_CANFAIL) must not fail when replacing an existing mapping with the first mapping of a new page, which is an unintended consequence of the changes from the rmind-uvmplock branch in 2011.
The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write). If that fails and leaves the old pmap entry in place, then UVM won't hold the right locks when it eventually retries. This entanglement of the UVM and pmap locking was done in rmind-uvmplock in order to improve performance, but it also means that the UVM state and pmap state need to be kept in sync more than they did before. It would be possible to handle this in the UVM code instead of in the pmap code, but these pmap changes improve the handling of low memory situations in general, and handling this in UVM would be clunky, so this seemed like the better way to go.
This somewhat indirectly fixes PR 52706, as well as the failing assertion about "uvm_page_locked_p(old_pg)". (but only on x86, various other platforms will need their own changes to handle this issue.) In uvm_fault_upper_enter(), if pmap_enter(PMAP_CANFAIL) fails, assert that the pmap did not leave around a now-stale pmap mapping for an old page. If such a pmap mapping still existed after we unlocked the vm_map, the UVM code would not know later that it would need to lock the lower layer object while calling the pmap to remove or replace that stale pmap mapping. See PR 52706 for further details. hopefully workaround the irregularly "fork fails in init" problem. if a pool is growing, and the grower is PR_NOWAIT, mark this. if another caller wants to grow the pool and is also PR_NOWAIT, busy-wait for the original caller, which should either succeed or hard-fail fairly quickly.
implement the busy-wait by unlocking and relocking this pools mutex and returning ERESTART. other methods (such as having the caller do this) were significantly more code and this hack is fairly localised. ok chs@ riastradh@
Don't release the lock in the PR_NOWAIT allocation. Move flags setting after the acquiring the mutex. (from Tobias Nygren) apply the change from arch/x86/x86/pmap.c rev. 1.266 commitid vZRjvmxG7YTHLOfA:
In pmap_enter_ma(), only try to allocate pves if we might need them, and even if that fails, only fail the operation if we later discover that we really do need them. If we are replacing an existing mapping, reuse the pv structure where possible.
This implements the requirement that pmap_enter(PMAP_CANFAIL) must not fail when replacing an existing mapping with the first mapping of a new page, which is an unintended consequence of the changes from the rmind-uvmplock branch in 2011.
The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write). If that fails and leaves the old pmap entry in place, then UVM won't hold the right locks when it eventually retries. This entanglement of the UVM and pmap locking was done in rmind-uvmplock in order to improve performance, but it also means that the UVM state and pmap state need to be kept in sync more than they did before. It would be possible to handle this in the UVM code instead of in the pmap code, but these pmap changes improve the handling of low memory situations in general, and handling this in UVM would be clunky, so this seemed like the better way to go.
This somewhat indirectly fixes PR 52706 on the remaining platforms where this problem existed.
|
1.221.4.3 |
| 21-Apr-2020 |
martin | Sync with HEAD
|
1.221.4.2 |
| 13-Apr-2020 |
martin | Mostly merge changes from HEAD upto 20200411
|
1.221.4.1 |
| 10-Jun-2019 |
christos | Sync with HEAD
|
1.221.2.4 |
| 26-Dec-2018 |
pgoyette | Sync with HEAD, resolve a few conflicts
|
1.221.2.3 |
| 30-Sep-2018 |
pgoyette | Ssync with HEAD
|
1.221.2.2 |
| 06-Sep-2018 |
pgoyette | Sync with HEAD
Resolve a couple of conflicts (result of the uimin/uimax changes)
|
1.221.2.1 |
| 28-Jul-2018 |
pgoyette | Sync with HEAD
|
1.252.2.5 |
| 29-May-2025 |
martin | Pull up following revision(s) (requested by bouyer in ticket #1956):
sys/kern/subr_pool.c: revision 1.295
Never call pr_drain_hook from pool_allocator_alloc().
In the PR_WAITOK case it's called from pool_reclaim
In the !PR_WAITOK case we're holding the pool lock and if the drain hook wants kernel_lock we may deadlock with another thread holding kernel_lock and calling pool_get().
Fixes PR kern/59411
|
1.252.2.4 |
| 17-Jul-2022 |
martin | Pull up following revision(s) (requested by simonb in ticket #1479):
sys/kern/subr_pool.c: revision 1.285
Use 64-bit math to calculate pool sizes. Fixes overflow errors for pools larger than 4GB and gives the correct output for kernel pool pages in "vmstat -s" output.
|
1.252.2.3 |
| 08-Mar-2020 |
martin | Pull up following revision(s) (requested by chs in ticket #766):
sys/kern/subr_pool.c: revision 1.265
fix assertions about when it is ok for pool_get() to return NULL.
|
1.252.2.2 |
| 01-Sep-2019 |
martin | Pull up following revision(s) (requested by maxv in ticket #129):
sys/kern/subr_pool.c: revision 1.256 sys/kern/subr_pool.c: revision 1.257
Kernel Heap Hardening: use bitmaps on all off-page pools. This migrates 29 MI pools on amd64 from linked lists to bitmaps, which have higher security properties.
Then, change the computation of the size of the PH pools: take into account the bitmap area available by default in the ph_u2 union, and don't go with &phpool[>0] if &phpool[0] already has enough space to embed a bitmap.
The pools that are migrated in this change all use bitmaps small enough to fit in &phpool[0], therefore there is no increase in memory consumption.
-
Revert r1.254, put back || for KASAN, some destructors like lwp_dtor() caused false positives. Needs more work.
|
1.252.2.1 |
| 18-Aug-2019 |
martin | Pull up following revision(s) (requested by maxv in ticket #81):
sys/kern/subr_pool.c: revision 1.253 sys/kern/subr_pool.c: revision 1.254 sys/kern/subr_pool.c: revision 1.255
Kernel Heap Hardening: perform certain sanity checks on the pool caches directly, to immediately detect certain bugs that would otherwise have been detected only later on the pool layer, if the buffer ever reached the pool layer.
-
Replace || by && in KASAN, to increase the pool coverage. Strictly speaking, what we want to avoid is poisoning buffers that were referenced in a global list as part of the ctor. But, if a buffer indeed got referenced as part of the ctor, it necessarily has to be unreferenced in the dtor; which implies it has to have a dtor. So we want both a ctor and a dtor, and not just one of them.
Note that POOL_QUARANTINE already implicitly provides this increased coverage.
-
Initialize pp->pr_redzone to false. For some reason with KUBSAN GCC does not eliminate the unused branch in pr_item_linkedlist_put(), and this leads to a unused uninitialized access which triggers KUBSAN messages.
|
1.264.2.2 |
| 29-Feb-2020 |
ad | Sync with head.
|
1.264.2.1 |
| 25-Jan-2020 |
ad | Sync with head.
|
1.266.4.1 |
| 20-Apr-2020 |
bouyer | Sync with HEAD
|
1.274.2.2 |
| 03-Apr-2021 |
thorpej | Sync with HEAD.
|
1.274.2.1 |
| 03-Jan-2021 |
thorpej | Sync w/ HEAD.
|
1.276.4.1 |
| 01-Aug-2021 |
thorpej | Sync with HEAD.
|
1.285.4.3 |
| 29-May-2025 |
martin | Pull up following revision(s) (requested by bouyer in ticket #1122):
sys/kern/subr_pool.c: revision 1.295
Never call pr_drain_hook from pool_allocator_alloc().
In the PR_WAITOK case it's called from pool_reclaim
In the !PR_WAITOK case we're holding the pool lock and if the drain hook wants kernel_lock we may deadlock with another thread holding kernel_lock and calling pool_get().
Fixes PR kern/59411
|
1.285.4.2 |
| 15-Dec-2024 |
martin | Pull up following revision(s) (requested by chs in ticket #1028):
sys/kern/subr_pool.c: revision 1.292
pool: fix pool_sethiwat() to actually do something
The change that I made to the pool code back in April 2020 ("slightly change and fix the semantics of pool_set*wat()" ...) accidental broke pool_sethiwat() by making it have no effect.
This was discovered after the crash reported in PR 58666 was fixed.
The same machine (32-bit, with 10GB RAM) would hang due to the buffer cache causing the system to run out of kernel virtual space. The buffer cache uses a separate pool for buffer data for each power of 2 between DEV_BSIZE and MAXBSIZE, and if the usage pattern of buffer sizes changes then memory has to be moved between the different pools in order to create buffers of the new size. The buffer cache handles this by using pool_sethiwat() to cause memory freed from the buffer cache back to the pools to not be cached in the buffer cache pools but instead be freed back to the pools' back-end allocator (which allocates from the low-level kva allocator) as soon as possible. But since pool_sethiwat() wasn't doing anything, memory would stay cached in some buffer cache pools and starve other buffer cache pools (and a few other pools that do no use the kmem layer for memory allocation).
Fix pool_sethiwat() to do what it is supposed to do again.
|
1.285.4.1 |
| 20-Sep-2024 |
martin | Pull up following revision(s) (requested by rin in ticket #871):
sys/kern/subr_pool.c: revision 1.286
Avoid undefined behaviour.
|