| History log of /src/external/bsd/jemalloc/dist/configure |
| Revision | | Date | Author | Comments |
| 1.1 |
| 04-Mar-2019 |
christos | branches: 1.1.1; Initial revision
|
| 1.1.1.3 |
| 19-Apr-2026 |
christos | Import jemalloc-5.3.1 (previous was 5.3.0)
This release includes over 390 commits spanning bug fixes, new features, performance optimizations, and portability improvements. Multiple percent of system-level metric improvements were measured in tested production workloads. The release has gone through large-scale production testing at Meta.
New features:
Support pvalloc. (@Lapenkov: 5b1f2cc) Add double free detection for the debug build. (@izaitsevfb: 36366f3, @guangli-dai: 42daa1a, @divanorama: 1897f18) Add compile-time option --enable-pageid to enable memory mapping annotation. (@devnexen: 4fc5c4f) Add runtime option prof_bt_max to control the max stack depth for profiling. (@guangli-dai: a0734fd) Add compile-time option --enable-force-getenv to use getenv instead of secure_getenv. (@interwq: 481bbfc) Add compile-time option --disable-dss to disable the usage of sbrk(2). (@Svetlitski: ea5b7be) Add runtime option tcache_ncached_max to control the number of items in each size bin in the thread cache. (@guangli-dai: 8a22d10) Add runtime option calloc_madvise_threshold to determine if kernel or memset is used to zero the allocations for calloc. (@nullptr0-0: 5081c16) Add compile-time option --disable-user-config to disable reading the runtime configurations from /etc/malloc.conf or environment variable MALLOC_CONF. (@roblabla: c17bf8b) Add runtime option disable_large_size_classes to guard the new usable size calculation, which minimizes the memory overhead for large allocations, i.e., >= 4 * PAGE. (@guangli-dai: c067a55, 8347f10) Enable process_madvise usage, add runtime option process_madvise_max_batch to control the max # of regions in each madvise batch. (@interwq: 22440a0, @spredolac: 4246475) Add mallctl interfaces: opt.prof_bt_max (@guangli-dai: a0734fd) arena.<i>.name to set and get arena names. (@guangli-dai: ba19d2c) thread.tcache.max to set and get the tcache_max of the current thread. (@guangli-dai: a442d9b) thread.tcache.ncached_max.write and thread.tcache.ncached_max.read_sizeclass to set and get the ncached_max setup of the current thread. (@guangli-dai: 630f7de, 6b197fd) arenas.hugepage to return the hugepage size used, also exported to malloc stats. (@ilvokhin: 90c627e) approximate_stats.active to return an estimate of the current active bytes, which should not be compared with other stats retrieved. (@guangli-dai: 0988583) Bug fixes:
Prevent potential deadlocks in decaying during reentrancy. (@interwq: 434a68e) Fix segfault in extent coalescing. (@Svetlitski: 12311fe) Add null pointer detections in mallctl calls. (@Svetlitski: dc0a184, 0288126) Make mallctl arenas.lookup triable without crashing on invalid pointers. (@auxten: 019cccc, 5bac384) Demote sampled allocations for proper deallocations during arena_reset. (@Svetlitski: 62648c8) Fix jemalloc's read(2) and write(2). (@Svetlitski: d2c9ed3, @lexprfuncall: 9fdc116) Fix the pkg-config metadata file. (@BtbN: ed7e6fe, ce8ce99) Fix the autogen.sh so that it accepts quoted extra options. (@honggyukim: f6fe6ab) Fix rallocx() to set errno to ENOMEM upon OOMing. (@arter97: 38056fe, @interwq: 83b0757) Avoid stack overflow for internal variable array usage. (@nullptr0-0: 47c9bcd, 48f66cf, @xinydev: 9169e92) Fix background thread initialization race. (@puzpuzpuz: 4d0ffa0) Guard os_page_id against a NULL address. (@lexprfuncall: 79cc7dc) Handle tcache init failures gracefully. (@lexprfuncall: a056c20) Fix missing release of acquired neighbor edata in extent_try_coalesce_impl. (@spredolac: 675ab07) Fix memory leak of old curr_reg on san_bump_grow_locked failure. (@spredolac: 5904a42) Fix large alloc nrequests under-counting on cache misses. (@spredolac: 3cc56d3) Portability improvements:
Fix the build in C99. (@abaelhe: 56ddbea) Add pthread_setaffinity_np detection for non Linux/BSD platforms. (@devnexen: 4c95c95) Make VARIABLE_ARRAY compatible with compilers not supporting VLA, i.e., Visual Studio C compiler in C11 or C17 modes. (@madscientist: be65438) Fix the build on Linux using musl library. (@marv: aba1645, 45249cf) Reduce the memory overhead in small allocation sampling for systems with larger page sizes, e.g., ARM. (@Svetlitski: 5a858c6) Add C23's free_sized and free_aligned_sized. (@Svetlitski: cdb2c0e) Enable heap profiling on MacOS. (@nullptr0-0: 4b555c1) Fix incorrect printing on 32bit. (@sundb: 630434b) Make JEMALLOC_CXX_THROW compatible with C++ versions newer than C++17. (@r-barnes, @guangli-dai: 21bcc0a) Fix mmap tag conflicts on MacOS. (@kdrag0n: c893fcd) Fix monotonic timer assumption for win32. (@burtonli: 8dc97b1) Fix VM over-reservation on systems with larger pages, e.g., aarch64. (@interwq: cd05b19) Remove unreachable() macro conditionally to prevent definition conflicts for C23+. (@appujee: d8486b2, 4b88bdd) Fix dlsym failure observed on FreeBSD. (@rhelmot: 86bbaba) Change the default page size to 64KB on aarch64 Linux. (@lexprfuncall: 9442300) Update config.guess and config.sub to the latest version. (@lexprfuncall: c51949e) Determine the page size on Android from NDK header files. (@lexprfuncall: c51abba) Improve the portability of grep patterns in configure.ac. (@lexprfuncall: 365747b) Add compile-time option --with-cxx-stdlib to specify the C++ standard library. (@yuxuanchen1997: a10ef3e) Optimizations and refactors:
Enable tcache for deallocation-only threads. (@interwq: 143e9c4) Inline to accelerate operator delete. (@guangli-dai: e8f9f13) Optimize pairing heap's performance. (@deadalnix: 5266152, be6da4f, 543e2d6, 10d7131, 92aa52c, @Svetlitski: 36ca0c1) Inline the storage for thread name in the profiling data. (@interwq: ce0b7ab, e62aa47) Optimize a hot function edata_cmp_summary_comp to accelerate it. (@Svetlitski: 6841110, @guangli-dai: 0181aaa) Allocate thread cache using the base allocator, which enables thread cache to use thp when metadata_thp is turned on. (@interwq: 72cfdce) Allow oversize arena not to purge immediately when background threads are enabled, although the default decay time is 0 to be back compatible. (@interwq: d131331) Optimize thread-local storage implementation on Windows. (@mcfi: 9e123a8, 3a0d9cd) Optimize fast path to allow static size class computation. (@interwq: 323ed2e) Redesign tcache GC to regulate the frequency and make it locality-aware. The new design is default on, guarded by option experimental_tcache_gc. (@nullptr0-0: 0c88be9, e2c9f3a, 14d5dc1, @deadalnix: 5afff2e) Reduce the arena switching overhead by avoiding forced purging when background thread is enabled. (@interwq: a3910b9) Improve the reuse efficiency by limiting the maximum coalesced size for large extents. (@jiebinn: 3c14707) Refactor thread events to allow registration of users' thread events and remove prof_threshold as the built-in event. (@spredolac: e6864c6, 015b017, 34ace91) Documentation:
Update Windows building instructions. (@Lapenkov: 3713932) Add vcpkg installation instructions. (@LilyWangLL: c0c9783) Update profiling internals with an example. (@jordalgo: b04e766)
|
| 1.1.1.2 |
| 23-Sep-2024 |
christos | Import jemalloc-5.3.0 (previous was 5.1.0)
* 5.3.0 (May 6, 2022)
This release contains many speed and space optimizations, from micro optimizations on common paths to rework of internal data structures and locking schemes, and many more too detailed to list below. Multiple percent of system level metric improvements were measured in tested production workloads. The release has gone through large-scale production testing.
New features: - Add the thread.idle mallctl which hints that the calling thread will be idle for a nontrivial period of time. (@davidtgoldblatt) - Allow small size classes to be the maximum size class to cache in the thread-specific cache, through the opt.[lg_]tcache_max option. (@interwq, @jordalgo) - Make the behavior of realloc(ptr, 0) configurable with opt.zero_realloc. (@davidtgoldblatt) - Add 'make uninstall' support. (@sangshuduo, @Lapenkov) - Support C++17 over-aligned allocation. (@marksantaniello) - Add the thread.peak mallctl for approximate per-thread peak memory tracking. (@davidtgoldblatt) - Add interval-based stats output opt.stats_interval. (@interwq) - Add prof.prefix to override filename prefixes for dumps. (@zhxchen17) - Add high resolution timestamp support for profiling. (@tyroguru) - Add the --collapsed flag to jeprof for flamegraph generation. (@igorwwwwwwwwwwwwwwwwwwww) - Add the --debug-syms-by-id option to jeprof for debug symbols discovery. (@DeannaGelbart) - Add the opt.prof_leak_error option to exit with error code when leak is detected using opt.prof_final. (@yunxuo) - Add opt.cache_oblivious as an runtime alternative to config.cache_oblivious. (@interwq) - Add mallctl interfaces: + opt.zero_realloc (@davidtgoldblatt) + opt.cache_oblivious (@interwq) + opt.prof_leak_error (@yunxuo) + opt.stats_interval (@interwq) + opt.stats_interval_opts (@interwq) + opt.tcache_max (@interwq) + opt.trust_madvise (@azat) + prof.prefix (@zhxchen17) + stats.zero_reallocs (@davidtgoldblatt) + thread.idle (@davidtgoldblatt) + thread.peak.{read,reset} (@davidtgoldblatt)
Bug fixes: - Fix the synchronization around explicit tcache creation which could cause invalid tcache identifiers. This regression was first released in 5.0.0. (@yoshinorim, @davidtgoldblatt) - Fix a profiling biasing issue which could cause incorrect heap usage and object counts. This issue existed in all previous releases with the heap profiling feature. (@davidtgoldblatt) - Fix the order of stats counter updating on large realloc which could cause failed assertions. This regression was first released in 5.0.0. (@azat) - Fix the locking on the arena destroy mallctl, which could cause concurrent arena creations to fail. This functionality was first introduced in 5.0.0. (@interwq)
Portability improvements: - Remove nothrow from system function declarations on macOS and FreeBSD. (@davidtgoldblatt, @fredemmott, @leres) - Improve overcommit and page alignment settings on NetBSD. (@zoulasc) - Improve CPU affinity support on BSD platforms. (@devnexen) - Improve utrace detection and support. (@devnexen) - Improve QEMU support with MADV_DONTNEED zeroed pages detection. (@azat) - Add memcntl support on Solaris / illumos. (@devnexen) - Improve CPU_SPINWAIT on ARM. (@AWSjswinney) - Improve TSD cleanup on FreeBSD. (@Lapenkov) - Disable percpu_arena if the CPU count cannot be reliably detected. (@azat) - Add malloc_size(3) override support. (@devnexen) - Add mmap VM_MAKE_TAG support. (@devnexen) - Add support for MADV_[NO]CORE. (@devnexen) - Add support for DragonFlyBSD. (@devnexen) - Fix the QUANTUM setting on MIPS64. (@brooksdavis) - Add the QUANTUM setting for ARC. (@vineetgarc) - Add the QUANTUM setting for LoongArch. (@wangjl-uos) - Add QNX support. (@jqian-aurora) - Avoid atexit(3) calls unless the relevant profiling features are enabled. (@BusyJay, @laiwei-rice, @interwq) - Fix unknown option detection when using Clang. (@Lapenkov) - Fix symbol conflict with musl libc. (@georgthegreat) - Add -Wimplicit-fallthrough checks. (@nickdesaulniers) - Add __forceinline support on MSVC. (@santagada) - Improve FreeBSD and Windows CI support. (@Lapenkov) - Add CI support for PPC64LE architecture. (@ezeeyahoo)
Incompatible changes: - Maximum size class allowed in tcache (opt.[lg_]tcache_max) now has an upper bound of 8MiB. (@interwq)
Optimizations and refactors (@davidtgoldblatt, @Lapenkov, @interwq): - Optimize the common cases of the thread cache operations. - Optimize internal data structures, including RB tree and pairing heap. - Optimize the internal locking on extent management. - Extract and refactor the internal page allocator and interface modules.
Documentation: - Fix doc build with --with-install-suffix. (@lawmurray, @interwq) - Add PROFILING_INTERNALS.md. (@davidtgoldblatt) - Ensure the proper order of doc building and installation. (@Mingli-Yu)
* 5.2.1 (August 5, 2019)
This release is primarily about Windows. A critical virtual memory leak is resolved on all Windows platforms. The regression was present in all releases since 5.0.0.
Bug fixes: - Fix a severe virtual memory leak on Windows. This regression was first released in 5.0.0. (@Ignition, @j0t, @frederik-h, @davidtgoldblatt, @interwq) - Fix size 0 handling in posix_memalign(). This regression was first released in 5.2.0. (@interwq) - Fix the prof_log unit test which may observe unexpected backtraces from compiler optimizations. The test was first added in 5.2.0. (@marxin, @gnzlbg, @interwq) - Fix the declaration of the extent_avail tree. This regression was first released in 5.1.0. (@zoulasc) - Fix an incorrect reference in jeprof. This functionality was first released in 3.0.0. (@prehistoric-penguin) - Fix an assertion on the deallocation fast-path. This regression was first released in 5.2.0. (@yinan1048576) - Fix the TLS_MODEL attribute in headers. This regression was first released in 5.0.0. (@zoulasc, @interwq)
Optimizations and refactors: - Implement opt.retain on Windows and enable by default on 64-bit. (@interwq, @davidtgoldblatt) - Optimize away a branch on the operator delete[] path. (@mgrice) - Add format annotation to the format generator function. (@zoulasc) - Refactor and improve the size class header generation. (@yinan1048576) - Remove best fit. (@djwatson) - Avoid blocking on background thread locks for stats. (@oranagra, @interwq)
* 5.2.0 (April 2, 2019)
This release includes a few notable improvements, which are summarized below: 1) improved fast-path performance from the optimizations by @djwatson; 2) reduced virtual memory fragmentation and metadata usage; and 3) bug fixes on setting the number of background threads. In addition, peak / spike memory usage is improved with certain allocation patterns. As usual, the release and prior dev versions have gone through large-scale production testing.
New features: - Implement oversize_threshold, which uses a dedicated arena for allocations crossing the specified threshold to reduce fragmentation. (@interwq) - Add extents usage information to stats. (@tyleretzel) - Log time information for sampled allocations. (@tyleretzel) - Support 0 size in sdallocx. (@djwatson) - Output rate for certain counters in malloc_stats. (@zinoale) - Add configure option --enable-readlinkat, which allows the use of readlinkat over readlink. (@davidtgoldblatt) - Add configure options --{enable,disable}-{static,shared} to allow not building unwanted libraries. (@Ericson2314) - Add configure option --disable-libdl to enable fully static builds. (@interwq) - Add mallctl interfaces: + opt.oversize_threshold (@interwq) + stats.arenas.<i>.extent_avail (@tyleretzel) + stats.arenas.<i>.extents.<j>.n{dirty,muzzy,retained} (@tyleretzel) + stats.arenas.<i>.extents.<j>.{dirty,muzzy,retained}_bytes (@tyleretzel)
Portability improvements: - Update MSVC builds. (@maksqwe, @rustyx) - Workaround a compiler optimizer bug on s390x. (@rkmisra) - Make use of pthread_set_name_np(3) on FreeBSD. (@trasz) - Implement malloc_getcpu() to enable percpu_arena for windows. (@santagada) - Link against -pthread instead of -lpthread. (@paravoid) - Make background_thread not dependent on libdl. (@interwq) - Add stringify to fix a linker directive issue on MSVC. (@daverigby) - Detect and fall back when 8-bit atomics are unavailable. (@interwq) - Fall back to the default pthread_create if dlsym(3) fails. (@interwq)
Optimizations and refactors: - Refactor the TSD module. (@davidtgoldblatt) - Avoid taking extents_muzzy mutex when muzzy is disabled. (@interwq) - Avoid taking large_mtx for auto arenas on the tcache flush path. (@interwq) - Optimize ixalloc by avoiding a size lookup. (@interwq) - Implement opt.oversize_threshold which uses a dedicated arena for requests crossing the threshold, also eagerly purges the oversize extents. Default the threshold to 8 MiB. (@interwq) - Clean compilation with -Wextra. (@gnzlbg, @jasone) - Refactor the size class module. (@davidtgoldblatt) - Refactor the stats emitter. (@tyleretzel) - Optimize pow2_ceil. (@rkmisra) - Avoid runtime detection of lazy purging on FreeBSD. (@trasz) - Optimize mmap(2) alignment handling on FreeBSD. (@trasz) - Improve error handling for THP state initialization. (@jsteemann) - Rework the malloc() fast path. (@djwatson) - Rework the free() fast path. (@djwatson) - Refactor and optimize the tcache fill / flush paths. (@djwatson) - Optimize sync / lwsync on PowerPC. (@chmeeedalf) - Bypass extent_dalloc() when retain is enabled. (@interwq) - Optimize the locking on large deallocation. (@interwq) - Reduce the number of pages committed from sanity checking in debug build. (@trasz, @interwq) - Deprecate OSSpinLock. (@interwq) - Lower the default number of background threads to 4 (when the feature is enabled). (@interwq) - Optimize the trylock spin wait. (@djwatson) - Use arena index for arena-matching checks. (@interwq) - Avoid forced decay on thread termination when using background threads. (@interwq) - Disable muzzy decay by default. (@djwatson, @interwq) - Only initialize libgcc unwinder when profiling is enabled. (@paravoid, @interwq)
Bug fixes (all only relevant to jemalloc 5.x): - Fix background thread index issues with max_background_threads. (@djwatson, @interwq) - Fix stats output for opt.lg_extent_max_active_fit. (@interwq) - Fix opt.prof_prefix initialization. (@davidtgoldblatt) - Properly trigger decay on tcache destroy. (@interwq, @amosbird) - Fix tcache.flush. (@interwq) - Detect whether explicit extent zero out is necessary with huge pages or custom extent hooks, which may change the purge semantics. (@interwq) - Fix a side effect caused by extent_max_active_fit combined with decay-based purging, where freed extents can accumulate and not be reused for an extended period of time. (@interwq, @mpghf) - Fix a missing unlock on extent register error handling. (@zoulasc)
Testing: - Simplify the Travis script output. (@gnzlbg) - Update the test scripts for FreeBSD. (@devnexen) - Add unit tests for the producer-consumer pattern. (@interwq) - Add Cirrus-CI config for FreeBSD builds. (@jasone) - Add size-matching sanity checks on tcache flush. (@davidtgoldblatt, @interwq)
Incompatible changes: - Remove --with-lg-page-sizes. (@davidtgoldblatt)
Documentation: - Attempt to build docs by default, however skip doc building when xsltproc is missing. (@interwq, @cmuellner)
|
| 1.1.1.1 |
| 04-Mar-2019 |
christos | branches: 1.1.1.1.2; 1.1.1.1.14; import jemalloc-5.1.0
|
| 1.1.1.1.14.1 |
| 02-Aug-2025 |
perseant | Sync with HEAD
|
| 1.1.1.1.2.2 |
| 10-Jun-2019 |
christos | Sync with HEAD
|
| 1.1.1.1.2.1 |
| 04-Mar-2019 |
christos | file configure was added on branch phil-wifi on 2019-06-10 21:44:51 +0000
|