Cross Reference: /src/sys/kern/vfs

History log of /src/sys/kern/vfs_wapbl.c
Revision	Date	Author	Comments
1.117	07-Dec-2024	riastradh	sys/kern/vfs_wapbl.c: Provide stub SET_ERROR for userland builds. Should fix: /tmp/build/2024.12.07.14.08.54-amd64/src/sys/kern/vfs_wapbl.c: In function 'wapbl_replay_start': /tmp/build/2024.12.07.14.08.54-amd64/src/sys/kern/vfs_wapbl.c:2978:24: error: implicit declaration of function 'SET_ERROR'; did you mean 'EV_ERROR'? [-Werror=implicit-function-declaration] 2978 \| return SET_ERROR(EINVAL); \| ^~~~~~~~~ \| EV_ERROR
1.116	07-Dec-2024	riastradh	vfs(9): Sprinkle SET_ERROR dtrace probes. PR kern/58378: Kernel error code origination lacks dtrace probes
1.115	07-Dec-2024	riastradh	vfs(9): Fix some more whitespace issues. No functional change intended.
1.114	07-Dec-2024	riastradh	vfs(9): Sprinkle KNF. No functional change intended.
1.113	13-May-2024	msaitoh	s/of of/of/ in comment.
1.112	09-Apr-2022	riastradh	sys: Use membar_release/acquire around reference drop. This just goes through my recent reference count membar audit and changes membar_exit to membar_release and membar_enter to membar_acquire -- this should make everything cheaper on most CPUs without hurting correctness, because membar_acquire is generally cheaper than membar_enter.
1.111	04-Apr-2022	andvar	fix various typos, mainly in comments.
1.110	12-Mar-2022	riastradh	sys: Membar audit around reference count releases. If two threads are using an object that is freed when the reference count goes to zero, we need to ensure that all memory operations related to the object happen before freeing the object. Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one thread takes responsibility for freeing, but it's not enough to ensure that the other thread's memory operations happen before the freeing. Consider: Thread A Thread B obj->foo = 42; obj->baz = 73; mumble(&obj->bar); grumble(&obj->quux); /* membar_exit(); / / membar_exit(); / atomic_dec -- not last atomic_dec -- last / membar_enter(); / KASSERT(invariant(obj->foo, obj->bar)); free_stuff(obj); The memory barriers ensure that obj->foo = 42; mumble(&obj->bar); in thread A happens before KASSERT(invariant(obj->foo, obj->bar)); free_stuff(obj); in thread B. Without them, this ordering is not guaranteed. So in general it is necessary to do membar_exit(); if (atomic_dec_uint_nv(&obj->refcnt) != 0) return; membar_enter(); to release a reference, for the `last one out hit the lights' style of reference counting. (This is in contrast to the style where one thread blocks new references and then waits under a lock for existing ones to drain with a condvar -- no membar needed thanks to mutex(9).) I searched for atomic_dec to find all these. Obviously we ought to have a better abstraction for this because there's so much copypasta. This is a stop-gap measure to fix actual bugs until we have that. It would be nice if an abstraction could gracefully handle the different styles of reference counting in use -- some years ago I drafted an API for this, but making it cover everything got a little out of hand (particularly with struct vnode::v_usecount) and I ended up setting it aside to work on psref/localcount instead for better scalability. I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I only put it on things that look performance-critical on 5sec review. We should really adopt membar_enter_preatomic/membar_exit_postatomic or something (except they are applicable only to atomic r/m/w, not to atomic_load/store_, making the naming annoying) and get rid of all the ifdefs.
1.109	03-Aug-2021	chs	initialize wc_unused to 0, to avoid writing uninitialized memory to disk. detected by KMSAN.
1.108	12-Apr-2020	jdolecek	fix wapbl_discard() to actually discard the queued bufs properly - need to set BC_INVAL for them, and also need to explicitly remove them from the BQ_LOCKED queue fixes DIAGNOSTIC panic when force unmounting unresponsive disk device PR kern/51178 by Michael van Elst
1.107	12-Apr-2020	jdolecek	fix race between wapbl_discard() and wapbl_biodone() on forced unmount on shutdown with slow I/O device wapbl_discard() needs to hold both wl_mtx and bufcache_lock while manipulating wl_entries - the rw lock is not enough, because wapbl_biodone() only takes wl_mtx while removing the finished entry from list wapbl_biodone() must take bufcache_lock before reading we->we_wapbl, so it's blocked until wapbl_discard() finishes, and takes !wl path appropriately this is supposed to fix panic on shutdown: [ 67549.6304123] forcefully unmounting / (/dev/wd0a)... ... [ 67549.7272030] panic: mutex_vector_enter,510: uninitialized lock (lock=0xffffa722a4f4f5b0, from=ffffffff80a884fa) ... [ 67549.7272030] wapbl_biodone() at netbsd:wapbl_biodone+0x4d [ 67549.7272030] biointr() at netbsd:biointr+0x7d [ 67549.7272030] softint_dispatch() at netbsd:softint_dispatch+0x12c [ 67549.7272030] Xsoftintr() at netbsd:Xsoftintr+0x4f
1.106	16-Mar-2020	pgoyette	branches: 1.106.2; Use the module subsystem's ability to process SYSCTL_SETUP() entries to automate installation of sysctl nodes. Note that there are still a number of device and pseudo-device modules that create entries tied to individual device units, rather than to the module itself. These are not changed.
1.105	14-Mar-2020	ad	OR into bp->b_cflags; don't overwrite.
1.104	08-Mar-2020	ad	Typo.
1.103	10-Dec-2018	jdolecek	constify wapbl_ops
1.102	10-Dec-2018	jdolecek	add wo_wapbl_jlock_assert to wapbl_ops
1.101	02-Dec-2017	jdolecek	branches: 1.101.2; 1.101.4; according to benchmark extracting pkgsrc.tar, using FUA and hence waiting for each transfer to write through to the medium is way slower than just letting the drive use a cached write and doing DIOCCACHESYNC on the end Results were (fs block 32KB / frag 4KB, partition aligned on 32KB boundary): HDD at siisata(4): no-FUA: 108 sec w/FUA: 294 sec SSD at ahcisata(4): no-FUA: 73 sec w/FUA: 502 sec change the flag so that FUA is only used for the commit block write; for journal data write, only pass DPO, rely on the cache flush to get them to media
1.100	27-Oct-2017	joerg	Revert printf return value change.
1.99	27-Oct-2017	utkarsh009	[syzkaller] Cast all the printf's to (void *) > as a result of new printf(9) declaration.
1.98	23-Oct-2017	jdolecek	remove counter for 'journal I/O bufs biowait' - it's (total - async), so superfluous; adjust the description of the the other counters a bit to make them more clear
1.97	08-Jun-2017	chs	move some buffer cache internals declarations from buf.h to vfs_bio.c. this is needed to avoid name conflicts with ZFS and also makes it clearer that other code shouldn't be messing with these. remove the LFS debug code that poked around in bufqueues and remove the BQ_EMPTY bufqueue since nothing uses it anymore. provide a function to let LFS and wapbl read the value of nbuf for now.
1.96	10-Apr-2017	jdolecek	rename allow_fuadpo to allow_dpofua, so it's the same order as the SCSI flag
1.95	10-Apr-2017	jdolecek	improve performance of journal writes by parallelizing the I/O - use 4 bufs by default, add sysctl vfs.wapbl.journal_iobufs to control it this also removes need to allocate iobuf during commit, so it might help to avoid deadlock during memory shortages like PR kern/47030
1.94	10-Apr-2017	jdolecek	change b_wapbllist to TAILQ, to preserve the LRU order
1.93	05-Apr-2017	jdolecek	optionally use FUA instead of full cache sync, and DPO for journal writes, when supported by disk device; controlled by sysctl vfs.wapbl.allow_fuadpo, default off for now discussed on tech-kern
1.92	17-Mar-2017	riastradh	Back out part of previous: missed a caller of wapbl_write_inodes.
1.91	17-Mar-2017	riastradh	Nix trailing whitespace.
1.90	17-Mar-2017	riastradh	Sort includes.
1.89	17-Mar-2017	riastradh	Assert write lock in wapbl_write_revocations, wapbl_write_inodes. Only one call site, so trivial to prove correct.
1.88	05-Mar-2017	mrg	add missing sys/evcnt.h include.
1.87	05-Mar-2017	jdolecek	add some event counters, for commits, writes, cache flush
1.86	10-Nov-2016	jdolecek	branches: 1.86.2; during truncate with wapbl, register deallocation for upper indirect block before recursing into lower blocks, to make sure that it will be removed after all its referenced blocks are removed fixes 'ffs_blkfree_common: freeing free block' panic triggered by ufs_truncate_retry() when just the upper indirect block registration failed, code tried to free the lower blocks again after wapbl flush problem found by hannken@, thank you
1.85	28-Oct-2016	jdolecek	reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially succeed; change wapbl_register_deallocation() to return EAGAIN rather than panic when code hits the limit callers changed to either loop calling ffs_truncate() using new utility ufs_truncate_retry() if their semantics requires it, or just ignore the failure; remove ufs_wapbl_truncate() this fixes possible user-triggerable panic during truncate, and resolves WAPBL performance issue with truncates of large files PR kern/47146 and kern/49175
1.84	02-Oct-2016	jdolecek	drop wl_mtx mutex during call to pool_get() with PR_WAITOK pointed out by riastradh
1.83	02-Oct-2016	jdolecek	fix off-by-one in wapbl_write_revocations() - when exiting the write loop, wd gets set to next unwritten record, not last written one as code assumed; 'lost head!' KASSERT is not triggered any more
1.82	02-Oct-2016	jdolecek	wapbl_write_revocations(): fix use-after-free when writing more then one block worth of revocations, introduced in previous commit; discovered by Brad Harder on current-users
1.81	01-Oct-2016	jdolecek	allocate wapbl dealloc registration structures via pool, so that there is more flexibility with limit handling
1.80	22-Sep-2016	jdolecek	misplaced comment
1.79	22-Sep-2016	jdolecek	store the number of block records per block into wl as wl_brperjblock, so that it's visible it's same value everywhere; no functional change
1.78	19-May-2016	riastradh	branches: 1.78.2; Replace deprecated disabled code by comment describing what it intends to do, and why it won't work yet From coypu.
1.77	07-May-2016	riastradh	Tweak comment on wapbl_flush.
1.76	07-May-2016	riastradh	Use %jx and a cast to uintmax_t, not %x, to print a dev_t.
1.75	07-May-2016	riastradh	Clarify comment about early exit from wapbl_flush. Note possible bug. Requires further analysis.
1.74	07-May-2016	riastradh	Omit unused parameter to wapbl_fini.
1.73	07-May-2016	riastradh	Delete debugging option wapbl_lazy_truncate. Simplify. Likely nobody has used this in the past decade -- you would have to enter ddb and write 1 to it in order to enable it anyway. Patch prepared by coypu.
1.72	07-May-2016	riastradh	Turn WAPBL_DEBUG panic or KASSERT into KASSERTMSG From coypu.
1.71	07-May-2016	riastradh	Document log layout and internal subroutines of vfs_wapbl.c.
1.70	07-May-2016	riastradh	KASSERT(A); KASSERT(B) instead of KASSERT(A && B).
1.69	07-May-2016	riastradh	Rename labels to make wapbl_flush a little easier to follow. out ---> wait_out out2 ---> out From coypu.
1.68	07-May-2016	riastradh	Sort and deduplicate includes.
1.67	03-May-2016	riastradh	Fix non-DIAGNOSTIC build.
1.66	03-May-2016	riastradh	panic takes no \n. From coypu.
1.65	03-May-2016	riastradh	#ifdef DIAGNOSTIC panic ---> KASSERTMSG From coypu.
1.64	15-Nov-2015	pgoyette	Enable the module's MODULE_CMD_FINI action. It actually works as intended.
1.63	14-Nov-2015	pgoyette	Fix obvious typo - even though it is inside a #ifdef notyet ... #endif
1.62	09-Aug-2015	mlelstv	Refactor disk address calculation from physical block numbers in the journal into a function. Make that function work correctly with sector sizes != DEV_BSIZE when compiled outside the kernel (i.e. fsck_ffs). Fixes PR bin/45933
1.61	18-Oct-2014	snj	branches: 1.61.2; src is too big these days to tolerate superfluous apostrophes. It's "its", people!
1.60	05-Sep-2014	matt	Don't next structure and enum definitions. Don't use C++ keywords new, try, class, private, etc.
1.59	25-Feb-2014	pooka	branches: 1.59.4; Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before the sysctl link sets are processed, and remove redundancy. Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate lines of code.
1.58	15-Sep-2013	martin	Remove unused variable
1.57	15-Sep-2013	joerg	Provide a prototype for wapbl_space_free under _KERNEL.
1.56	14-Sep-2013	joerg	wapbl_advance and friends are only used in the kernel
1.55	09-Feb-2013	christos	branches: 1.55.2; why didn't gcc find the formatting error?
1.54	08-Dec-2012	hannken	Try to coalesce writes to the journal in MAXPHYS sized and aligned blocks. Speeds up wapbl_flush() on raid5 by a factor of 3-4. Discussed on tech-kern. Needs pullup to NetBSD-6.
1.53	17-Nov-2012	hannken	wapbl_biodone: Release the buffer before reclaiming the log. wapbl_flush() may wait for the log to become empty and all buffers should be unbusy before it returns.
1.52	29-Apr-2012	chs	branches: 1.52.2; mark all wapbl I/O as BPRIO_TIMECRITICAL. this is the second part of addressing PR 46325.
1.51	28-Jan-2012	para	branches: 1.51.2; replacing malloc(9) with kmem(9) wapbl_entries get there own pool, they are freed from softint context ok: rmind@
1.50	27-Jan-2012	para	extending vmem(9) to be able to allocated resources for it's own needs. simplifying uvm_map handling (no special kernel entries anymore no relocking) make malloc(9) a thin wrapper around kmem(9) (with private interface for interrupt safety reasons) releng@ acknowledged
1.49	11-Jan-2012	yamt	comments
1.48	02-Dec-2011	yamt	branches: 1.48.2; - move disk cache flushing code into a separate function. - more verbose output if vfs.wapbl.verbose_commit >= 2. namely, time taken for each DIOCCACHESYNC calls. wapbl_flush: 1322826000.785245900 this transaction = 546304 bytes wapbl_cache_sync: 1: dev 0x0 0.017572724 wapbl_cache_sync: 2: dev 0x0 0.007199825 wapbl_flush: 1322826011.860771302 this transaction = 431104 bytes wapbl_cache_sync: 1: dev 0x0 0.019469753 wapbl_cache_sync: 2: dev 0x0 0.009473410 wapbl_flush: 1322829266.489154342 this transaction = 187904 bytes wapbl_cache_sync: 1: dev 0x4 0.022270180 wapbl_cache_sync: 2: dev 0x4 0.030749402 - fix a comment.
1.47	01-Sep-2011	christos	branches: 1.47.2; add a couple of asserts
1.46	14-Aug-2011	christos	fix sign-compare warnings
1.45	12-Jun-2011	rmind	Welcome to 5.99.53! Merge rmind-uvmplock branch: - Reorganize locking in UVM and provide extra serialisation for pmap(9). New lock order: [vmpage-owner-lock] -> pmap-lock. - Simplify locking in some pmap(9) modules by removing P->V locking. - Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs). - Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner. Add TLBSTATS option for x86 to collect statistics about TLB shootdowns. - Unify /dev/mem et al in MI code and provide required locking (removes kernel-lock on some ports). Also, avoid cache-aliasing issues. Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches formed the core changes of this branch.
1.44	26-May-2011	uebayasi	branches: 1.44.2; Catch up with B_* flag name changes in debug code.
1.43	20-Feb-2011	nakayama	Fix digit number of nanosecond.
1.42	18-Feb-2011	hannken	Adjust previous: set the dealloc soft limit to half hard limit.
1.41	16-Feb-2011	hannken	Set the limit for deallocations in one transaction to a more realistic (and much lower) value. When flushing the log these deallocations will produce new blocks and that may execeed the journal size resulting in a "wapbl_flush: current transaction too big to flush" panic. Seen when removing a large snapshot. Adresses PR #44568 (WAPBL doens't play nice with snapshots).
1.40	14-Feb-2011	bouyer	if DIAGNOSTIC, check the size of the transaction in wapbl_end(). Hopefully this will point us to the place which generaed the large transaction, before an asynchronous panic() in wabl_end()
1.39	08-Jan-2011	christos	branches: 1.39.2; 1.39.4; Add two sysctls one that does verbose transaction logging and a second one that disables flushing the disk cache (which is fast but dangerous for data integrity). From simon a long while back.
1.38	09-Nov-2010	hannken	Wapbl_register_deallocation(): the taken reader lock is not sufficient to protect wl_dealloc* members. Take the mutex here and change the lock requirements of these fields to "writer lock or mutex". This error lead to file system corruption and "freeing free block" panics.
1.37	10-Sep-2010	drochner	fix two bugs reported by Ryo Shimizu: -wrong initialization reported in a followup to PR bin/43336 (looks harmless because it applies to zero-initialized memory, so LIST_INIT() is a no-op) -wrong loop count in reply misses a hash bucket (PR kern/43827) (this was introduced by a post-netbsd-5 change, so it isn't related to the PR above)
1.36	21-Apr-2010	pooka	dumdidumdum, need _KERNEL in previous for fsck. noticed by moof
1.35	21-Apr-2010	pooka	Reduce #ifdef spew by attaching wapbl as a module. (no, it's still too ifdef-ridden to be able to actually do anything useful and module-like like load into any kernel)
1.34	27-Feb-2010	mlelstv	branches: 1.34.2; Move block number computations to callers of wapl_read/wapl_write and conditionally build DEV_BSIZE adjustments for kernel. fsck_ffs shares the same code but accesses physical blocks. Also compute correct block numbers for each physical sector.
1.33	27-Feb-2010	mlelstv	Store physical block numbers in superblock that point to the journal. Calculate position of both commit headers correctly for disks with large sectors. Correct calculation of circular buffer size.
1.32	26-Feb-2010	mlelstv	mnt_fs_bshift is the filesystem block size, not the fragment size. Revert to physical block size. This is fine as long as filesystem and log stay on a similar physical medium.
1.31	23-Feb-2010	mlelstv	Use correct offset to block number calculations. Also change access to filesystem blocks to be done by fragment instead of by physical block. Fragments are the fundamental blocks of the filesystem. For a theoretical filesystem that accesses the disk in smaller units than stored in mp->mnt_fs_bshift, the assumption might be wrong. But this will also break other subsystems. The value mp->mnt_dev_bshift which formerly represents the physical sector size is currently only virtual in NetBSD (always DEV_BSIZE).
1.30	06-Feb-2010	uebayasi	branches: 1.30.2; __inline -> inline
1.29	25-Nov-2009	pooka	make WAPBL_DEBUG_PRINT compile
1.28	01-Oct-2009	pooka	Add dealloccnt to list of things to be considered in the stetson-harrison decision making algorithm for flushing a wapbl transation.
1.27	01-Oct-2009	pooka	Turn a KASSERT into a panic. I don't want us to be randomly overwriting memory on non-DIAGNOSTIC kernels if resource estimation fails.
1.26	14-Jul-2009	apb	Convert free text inside #ifdef to a proper comment. Inspired by PR 41255 from Kurt Lidl.
1.25	05-Apr-2009	lukem	branches: 1.25.2; fix sign-compare issues
1.24	15-Mar-2009	cegger	ansify function definitions
1.23	22-Feb-2009	ad	PR kern/39564 wapbl performance issues with disk cache flushing PR kern/40361 WAPBL locking panic in -current PR kern/40361 WAPBL locking panic in -current PR kern/40470 WAPBL corrupts ext2fs PR kern/40562 busy loop in ffs_sync when unmounting a file system PR kern/40525 panic: ffs_valloc: dup alloc - A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg buffers being invalidated. Problem discovered and patch by dholland@. - If the syncer fails to lazily sync a vnode due to lock contention, retry 1 second later instead of 30 seconds later. - Flush inode atime updates every ~10 seconds (this makes most sense with logging). Presently they didn't hit the disk for read-only files or devices until the file system was unmounted. It would be better to trickle the updates out but that would require more extensive changes. - Fix issues with file system corruption, busy looping and other nasty problems when logging and non-logging file systems are intermixed, with one being the root file system. - For logging, do not flush metadata on an inode-at-a-time basis if the sync has been requested by ioflush. Previously, we could try hundreds of log sync operations a second due to inode update activity, causing the syncer to fall behind and metadata updates to be serialized across the entire file system. Instead, burst out metadata and log flushes at a minimum interval of every 10 seconds on an active file system (happens more often if the log becomes full). Note this does not change the operation of fsync() etc. - With the flush issue fixed, re-enable concurrent metadata updates in vfs_wapbl.c.
1.22	18-Feb-2009	yamt	redo rev.1.19 correctly.
1.21	18-Feb-2009	yamt	whitespace
1.20	02-Feb-2009	yamt	branches: 1.20.2; remove a non-ascii comment.
1.19	02-Feb-2009	yamt	back to malloc for now as wapbl_biodone is called by softint.
1.18	31-Jan-2009	yamt	- malloc -> kmem_alloc - kill WAPBL_UVM_ALLOC. - kill wapbl_blk_pool to reduce #ifdef.
1.17	03-Jan-2009	yamt	remove extra semicolons.
1.16	24-Nov-2008	joerg	Move the specification of the on-disk journal format into a separate header.
1.15	20-Nov-2008	joerg	Push functionality to deal with existing inode records into a separate function.
1.14	18-Nov-2008	joerg	Decouple journal operation from replay header by copying the interesting fields into wapbl_replay as opposed to embedding wapbl_wc_header.
1.13	18-Nov-2008	joerg	#if 0 wapbl_replay_verify.
1.12	18-Nov-2008	joerg	Check for NULL before calling free as the kernel free doesn't handle it.
1.11	18-Nov-2008	joerg	Rename wapbl_replay_prescan to wapbl_replay_process.
1.10	18-Nov-2008	joerg	Refact wapbl_replay_prescan to use a function for each WAPBL record. Merge wapbl_replay_get_inodes into wapbl_replay_prescan. Change the logic to determine the head: It doesn't make sense to update it if the last inode record seen was not the beginning of the journal, as the beginning of the journal might not be 0, so always update inodeshead.
1.9	17-Nov-2008	joerg	In wapbl_replay_write just iterate over the hash table and not the transactions. The initial prescan has already sorted out what blocks are in the journal and removed any revoced blocks, so the hash table is authorative.
1.8	17-Nov-2008	joerg	Remove debug printf.
1.7	17-Nov-2008	joerg	Ensure that block records are correctly padded.
1.6	11-Nov-2008	joerg	Move WAPL replay handling from bread() into ufs_strategy. This changes the order of hook processing as the copy-on-write handlers are called after the journal processing. This makes more sense as the journal overwrite is logically part of the disk IO.
1.5	10-Nov-2008	joerg	Define wapbl_flush_fn_t only for the kernel.
1.4	10-Nov-2008	joerg	Reduce internals of WAPBL exposed to the rest of the system.
1.3	11-Aug-2008	yamt	branches: 1.3.2; 1.3.4; 1.3.6; 1.3.8; fix a comment.
1.2	31-Jul-2008	simonb	Merge the simonb-wapbl branch. From the original branch commit: Add Wasabi System's WAPBL (Write Ahead Physical Block Logging) journaling code. Originally written by Darrin B. Jewell while at Wasabi and updated to -current by Antti Kantee, Andy Doran, Greg Oster and Simon Burge. OK'd by core@, releng@.
1.1	10-Jun-2008	simonb	branches: 1.1.2; 1.1.4; file vfs_wapbl.c was initially added on branch simonb-wapbl.
1.1.4.2	13-Dec-2008	haad	Update haad-dm branch to haad-dm-base2.
1.1.4.1	19-Oct-2008	haad	Sync with HEAD.
1.1.2.11	28-Jul-2008	oster	Turn on WAPBL_DEBUG_SERIALIZE in order to use RW_WRITER locks instead of RW_READER locks in wapbl_begin(). Include the following comment as well: XXX: The original code calls for the use of a RW_READER lock here, but it turns out there are performance issues with high metadata-rate workloads (e.g. multiple simultaneous tar extractions). For now, we force the lock to be RW_WRITER, since that currently has the best performance characteristics (even for a single tar-file extraction). Approved by: simonb
1.1.2.10	25-Jul-2008	simonb	Remove an XXX comment that doesn't apply.
1.1.2.9	01-Jul-2008	matt	#include <sys/atomic.h> to make rump happy.
1.1.2.8	30-Jun-2008	oster	Protect v_numoutput with v_interlock. Approved by: simonb
1.1.2.7	19-Jun-2008	simonb	Fix reference counting for pool initialisation - atomic_inc_uint_nv() will return 1 (not 0!) for the first time a value is incremented.
1.1.2.6	18-Jun-2008	simonb	In wapbl_stop(), destroy the condvars/mutexes/locks created in wapbl_start(). Fixes a LOCKDEBUG panic.
1.1.2.5	18-Jun-2008	rmind	- Remove wapbl_global_mtx, use atomic-ops for reference counting; - Move few pool_put() calls out of the locked code area; OK by <simonb>.
1.1.2.4	12-Jun-2008	martin	License police
1.1.2.3	11-Jun-2008	simonb	Fix some whitespace and long line niggles.
1.1.2.2	11-Jun-2008	simonb	Fix a couple of typos. From wizd.
1.1.2.1	10-Jun-2008	simonb	Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block Logging) journaling code. Originally written by Darrin B. Jewell while at Wasabi and updated to -current by Antti Kantee, Andy Doran, Greg Oster and Simon Burge. Still a number of issues - look in doc/BRANCHES for "simonb-wapbl" for more info.
1.3.8.6	18-Jun-2011	bouyer	Pull up following revision(s) (requested by hannken in ticket #1627): sys/kern/vfs_wapbl.c: revisions 1.41-1.42 sbin/dump/snapshot.c: revisions 1.6 (patch) share/man/man4/fss.4: revisions 1.15 (patch) sys/dev/fss.c: revisions 1.73 (patch) sys/dev/fssvar.h: revisions 1.25 usr.sbin/fssconfig/fssconfig.c: revisions 1.7 sys/ufs/ffs/ffs_balloc.c: revisions 1.54 sys/ufs/ffs/ffs_snapshot.c: revisions 1.90, 1.98, 1.100-1.101, 1.103-1.110, 1.111, 1.112-1.115 (patch) - Try to keep snapshot indirect blocks contiguous. This speeds up snapshot creation by a factor of ~3 and reduces the file system suspension time by a factor of ~5. - Refine the scope of WAPBL transactions and the limit for deallocations in one transaction so we should no longer get a "wapbl_flush: current transaction too big to flush" panic when creating or removing snapshots on larger logging disks. - fss(4): Allow FSSIOCSET to set the initial flags. Add a new flag "FSS_UNLINK_ON_CREATE" to unlink the backing store before the snapshot gets created. With this change dump(8) no longer dumps the zero-sized, but named snapshot it is working on.
1.3.8.5	07-Mar-2011	riz	Pull up following revision(s) (requested by bouyer in ticket #1543): sys/kern/vfs_wapbl.c: revision 1.27 sys/kern/vfs_wapbl.c: revision 1.28 Turn a KASSERT into a panic. I don't want us to be randomly overwriting memory on non-DIAGNOSTIC kernels if resource estimation fails. Add dealloccnt to list of things to be considered in the stetson-harrison decision making algorithm for flushing a wapbl transation.
1.3.8.4	16-Feb-2011	bouyer	Pull up following revision(s) (requested by tron in ticket #1535): sys/kern/vfs_wapbl.c: revision 1.39 via patch Add two sysctls one that does verbose transaction logging and a second one that disables flushing the disk cache (which is fast but dangerous for data integrity). From simon a long while back.
1.3.8.3	22-Nov-2010	riz	Pull up following revision(s) (requested by hannken in ticket #1477): sys/kern/vfs_wapbl.c: revision 1.38 Wapbl_register_deallocation(): the taken reader lock is not sufficient to protect wl_dealloc* members. Take the mutex here and change the lock requirements of these fields to "writer lock or mutex". This error lead to file system corruption and "freeing free block" panics.
1.3.8.2	13-Sep-2010	snj	branches: 1.3.8.2.2; Apply patch (requested by drochner in ticket #1454): Fix inconsistencies in the wapbl replay process which can lead to a premature abort of the fsck run and possibly leave a corrupted filesystem. Addresses PR bin/43336.
1.3.8.1	24-Feb-2009	snj	branches: 1.3.8.1.2; 1.3.8.1.4; Pull up following revision(s) (requested by ad in ticket #490): sys/kern/vfs_wapbl.c: revision 1.23 sys/miscfs/syncfs/sync_subr.c: revision 1.36 sys/miscfs/syncfs/sync_vnops.c: revision 1.26 sys/ufs/ffs/ffs_alloc.c: revision 1.121 sys/ufs/ffs/ffs_vfsops.c: revision 1.242 sys/ufs/ffs/ffs_vnops.c: revision 1.110 PR kern/39564 wapbl performance issues with disk cache flushing PR kern/40361 WAPBL locking panic in -current PR kern/40361 WAPBL locking panic in -current PR kern/40470 WAPBL corrupts ext2fs PR kern/40562 busy loop in ffs_sync when unmounting a file system PR kern/40525 panic: ffs_valloc: dup alloc - A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg buffers being invalidated. Problem discovered and patch by dholland@. - If the syncer fails to lazily sync a vnode due to lock contention, retry 1 second later instead of 30 seconds later. - Flush inode atime updates every ~10 seconds (this makes most sense with logging). Presently they didn't hit the disk for read-only files or devices until the file system was unmounted. It would be better to trickle the updates out but that would require more extensive changes. - Fix issues with file system corruption, busy looping and other nasty problems when logging and non-logging file systems are intermixed, with one being the root file system. - For logging, do not flush metadata on an inode-at-a-time basis if the sync has been requested by ioflush. Previously, we could try hundreds of log sync operations a second due to inode update activity, causing the syncer to fall behind and metadata updates to be serialized across the entire file system. Instead, burst out metadata and log flushes at a minimum interval of every 10 seconds on an active file system (happens more often if the log becomes full). Note this does not change the operation of fsync() etc. - With the flush issue fixed, re-enable concurrent metadata updates in vfs_wapbl.c.
1.3.8.2.2.2	07-Mar-2011	riz	Pull up following revision(s) (requested by bouyer in ticket #1543): sys/kern/vfs_wapbl.c: revision 1.27 sys/kern/vfs_wapbl.c: revision 1.28 Turn a KASSERT into a panic. I don't want us to be randomly overwriting memory on non-DIAGNOSTIC kernels if resource estimation fails. Add dealloccnt to list of things to be considered in the stetson-harrison decision making algorithm for flushing a wapbl transation.
1.3.8.2.2.1	22-Nov-2010	riz	Pull up following revision(s) (requested by hannken in ticket #1477): sys/kern/vfs_wapbl.c: revision 1.38 Wapbl_register_deallocation(): the taken reader lock is not sufficient to protect wl_dealloc* members. Take the mutex here and change the lock requirements of these fields to "writer lock or mutex". This error lead to file system corruption and "freeing free block" panics.
1.3.8.1.4.1	20-May-2011	matt	bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
1.3.8.1.2.1	22-Nov-2010	riz	Pull up following revision(s) (requested by hannken in ticket #1477): sys/kern/vfs_wapbl.c: revision 1.38 Wapbl_register_deallocation(): the taken reader lock is not sufficient to protect wl_dealloc* members. Take the mutex here and change the lock requirements of these fields to "writer lock or mutex". This error lead to file system corruption and "freeing free block" panics.
1.3.6.3	28-Apr-2009	skrll	Sync with HEAD.
1.3.6.2	03-Mar-2009	skrll	Sync with HEAD.
1.3.6.1	19-Jan-2009	skrll	Sync with HEAD.
1.3.4.3	17-Jan-2009	mjf	Sync with HEAD.
1.3.4.2	28-Sep-2008	mjf	Sync with HEAD.
1.3.4.1	11-Aug-2008	mjf	file vfs_wapbl.c was added on branch mjf-devfs2 on 2008-09-28 10:40:54 +0000
1.3.2.2	18-Sep-2008	wrstuden	Sync with wrstuden-revivesa-base-2.
1.3.2.1	11-Aug-2008	wrstuden	file vfs_wapbl.c was added on branch wrstuden-revivesa on 2008-09-18 04:31:45 +0000
1.20.2.2	23-Jul-2009	jym	Sync with HEAD.
1.20.2.1	13-May-2009	jym	Sync with HEAD. Commit is split, to avoid a "too many arguments" protocol error.
1.25.2.6	09-Oct-2010	yamt	sync with head
1.25.2.5	11-Aug-2010	yamt	sync with head.
1.25.2.4	11-Mar-2010	yamt	sync with head
1.25.2.3	18-Jul-2009	yamt	sync with head.
1.25.2.2	04-May-2009	yamt	sync with head.
1.25.2.1	05-Apr-2009	yamt	file vfs_wapbl.c was added on branch yamt-nfs-mp on 2009-05-04 08:13:49 +0000
1.30.2.2	22-Oct-2010	uebayasi	Sync with HEAD (-D20101022).
1.30.2.1	30-Apr-2010	uebayasi	Sync with HEAD.
1.34.2.4	31-May-2011	rmind	sync with head
1.34.2.3	05-Mar-2011	rmind	sync with head
1.34.2.2	30-May-2010	rmind	sync with head
1.34.2.1	16-Mar-2010	rmind	Change struct uvm_object::vmobjlock to be dynamically allocated with mutex_obj_alloc(). It allows us to share the locks among UVM objects.
1.39.4.2	05-Mar-2011	bouyer	Sync with HEAD
1.39.4.1	17-Feb-2011	bouyer	Sync with HEAD
1.39.2.1	06-Jun-2011	jruoho	Sync with HEAD.
1.44.2.1	23-Jun-2011	cherry	Catchup with rmind-uvmplock merge.
1.47.2.4	22-May-2014	yamt	sync with head. for a reference, the tree before this commit was tagged as yamt-pagecache-tag8. this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
1.47.2.3	16-Jan-2013	yamt	sync with (a bit old) head
1.47.2.2	23-May-2012	yamt	sync with head.
1.47.2.1	17-Apr-2012	yamt	sync with head
1.48.2.2	02-Jun-2012	mrg	sync to latest -current.
1.48.2.1	18-Feb-2012	mrg	merge to -current.
1.51.2.2	02-Jan-2013	riz	Pull up following revision(s) (requested by hannken in ticket #758): sys/kern/vfs_wapbl.c: revision 1.53 sys/kern/vfs_wapbl.c: revision 1.54 wapbl_biodone: Release the buffer before reclaiming the log. wapbl_flush() may wait for the log to become empty and all buffers should be unbusy before it returns. Try to coalesce writes to the journal in MAXPHYS sized and aligned blocks. Speeds up wapbl_flush() on raid5 by a factor of 3-4. Discussed on tech-kern. Needs pullup to NetBSD-6.
1.51.2.1	07-May-2012	riz	Pull up following revision(s) (requested by chs in ticket #204): sys/fs/sysvbfs/sysvbfs_vnops.c: revision 1.44 sys/ufs/ffs/ffs_vfsops.c: revision 1.277 sys/fs/v7fs/v7fs_vnops.c: revision 1.11 sys/ufs/chfs/chfs_vnops.c: revision 1.7 sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.61 sys/miscfs/genfs/genfs_io.c: revision 1.54 sys/kern/vfs_wapbl.c: revision 1.52 sys/uvm/uvm_pager.h: revision 1.43 sys/ufs/ffs/ffs_vnops.c: revision 1.121 sys/kern/vfs_subr.c: revision 1.434 sys/fs/msdosfs/msdosfs_vnops.c: revision 1.83 sys/fs/ntfs/ntfs_vnops.c: revision 1.51 sys/fs/udf/udf_subr.c: revision 1.119 sys/miscfs/specfs/spec_vnops.c: revision 1.135 sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.103 sys/fs/udf/udf_vnops.c: revision 1.71 sys/ufs/ufs/ufs_readwrite.c: revision 1.104 change vflushbuf() to take the full FSYNC_* flags. translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that genfs_do_io() can set the appropriate io priority for the I/O. this is the first part of addressing PR 46325. mark all wapbl I/O as BPRIO_TIMECRITICAL. this is the second part of addressing PR 46325.
1.52.2.5	03-Dec-2017	jdolecek	update from HEAD
1.52.2.4	20-Aug-2014	tls	Rebase to HEAD as of a few days ago.
1.52.2.3	25-Feb-2013	tls	resync with head
1.52.2.2	20-Nov-2012	tls	Resync to 2012-11-19 00:00:00 UTC
1.52.2.1	12-Sep-2012	tls	Initial snapshot of work to eliminate 64K MAXPHYS. Basically works for physio (I/O to raw devices); needs more doing to get it going with the filesystems, but it shouldn't damage data. All work's been done on amd64 so far. Not hard to add support to other ports. If others want to pitch in, one very helpful thing would be to sort out when and how IDE disks can do 128K or larger transfers, and adjust the various PCI IDE (or at least ahcisata) drivers and wd.c accordingly -- it would make testing much easier. Another very helpful thing would be to implement a smart minphys() for RAIDframe along the lines detailed in the MAXPHYS-NOTES file.
1.55.2.1	18-May-2014	rmind	sync with head
1.59.4.1	09-Aug-2015	martin	Pull up following revision(s) (requested by mlelstv in ticket #943): sys/kern/vfs_wapbl.c: revision 1.62 Refactor disk address calculation from physical block numbers in the journal into a function. Make that function work correctly with sector sizes != DEV_BSIZE when compiled outside the kernel (i.e. fsck_ffs). Fixes PR bin/45933
1.61.2.6	28-Aug-2017	skrll	Sync with HEAD
1.61.2.5	05-Dec-2016	skrll	Sync with HEAD
1.61.2.4	05-Oct-2016	skrll	Sync with HEAD
1.61.2.3	29-May-2016	skrll	Sync with HEAD
1.61.2.2	27-Dec-2015	skrll	Sync with HEAD (as of 26th Dec)
1.61.2.1	22-Sep-2015	skrll	Sync with HEAD
1.78.2.4	26-Apr-2017	pgoyette	Sync with HEAD
1.78.2.3	20-Mar-2017	pgoyette	Sync with HEAD
1.78.2.2	07-Jan-2017	pgoyette	Sync with HEAD. (Note that most of these changes are simply $NetBSD$ tag issues.)
1.78.2.1	04-Nov-2016	pgoyette	Sync with HEAD
1.86.2.1	21-Apr-2017	bouyer	Sync with HEAD
1.101.4.3	21-Apr-2020	martin	Sync with HEAD
1.101.4.2	08-Apr-2020	martin	Merge changes from current as of 20200406
1.101.4.1	10-Jun-2019	christos	Sync with HEAD
1.101.2.1	26-Dec-2018	pgoyette	Sync with HEAD, resolve a few conflicts
1.106.2.1	20-Apr-2020	bouyer	Sync with HEAD

OpenGrok