Home | History | Annotate | only in /src/sys/ufs/ffs
History log of /src/sys/ufs/ffs
RevisionDateAuthorComments
 1.1 12-Jun-1998  cgd Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.174 27-Jun-2025  andvar s/quadradically/quadratically/ in comments.
 1.173 13-May-2024  msaitoh branches: 1.173.2;
s/contigous/contiguous/ in comment.
 1.172 07-Jan-2023  chs ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:

commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000

This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.

To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.

Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000

One last pass to get all the unsigned comparisons correct.


In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.171 23-Apr-2022  hannken branches: 1.171.4;
Need vnode locked fot VOP_FDISCARD().
 1.170 03-Sep-2021  andvar fix typos in comments, mainly s/extention/extension/ and s/sufficent/sufficient/
 1.169 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.168 26-Jul-2020  chs skip the assertions about page-locking when allocating to the extattr bmap,
since extattrs do not use the page cache.
 1.167 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.166 23-Feb-2020  ad branches: 1.166.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.165 18-Feb-2020  riastradh Fix non-DIAGNOSTIC build with UVM_PAGE_TRKOWN.
 1.164 14-Apr-2019  kardel branches: 1.164.4; 1.164.6;
PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.
Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:
Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.163 10-Dec-2018  jdolecek put back UFS_WAPBL_JUNLOCK_ASSERT(), the underlying rw_write_held() check
doesn't actually have a race since it checks if the rwlock is held by
current lwp
 1.162 10-Dec-2018  jdolecek make UFS_WAPBL_JLOCK_ASSERT() #ifdef DIAGNOSTIC, same as the underlying
function KASSERT(), so that it actually does something; fix code using
it to actually pass correct params, so that it compiles

remove UFS_WAPBL_JUNLOCK_ASSERT(), as that is inherently racy (it's
okay on those places if the rwlock is held by other lwp); depend
on the RW_ASSERT()/LOCKDEBUG inside rw_enter() to catch the case
with wapbl rwlock held by current lwp
 1.161 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.160 19-Jul-2018  ozaki-r Avoid using magic numbers for arguments of workqueue_create (NFC)
 1.159 07-Dec-2017  chs branches: 1.159.2; 1.159.4;
fix the UVM_PAGE_TRKOWN page-locking assertion at the top of ffs_alloc()
to work right for multi-threaded processes.
 1.158 13-Aug-2017  mlelstv Don't time out the discard work queue here. Either destroying a work queue
with pending work items panics or accessing freed resources from the work
item will crash. The timeout needs to be handled gracefully by the driver
that implements the discard operation.

Fixes parts of PR 50725.
 1.157 12-Jul-2017  hannken When initializing more inodes make sure to write them to disk
before writing the cylinder group with updated cg_initediblk.
 1.156 18-Mar-2017  riastradh branches: 1.156.6;
#if DIAGNOSTIC panic ---> KASSERT
 1.155 01-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().
 1.154 30-Oct-2016  christos branches: 1.154.2;
Tidy up panic messages, no functional change.
 1.153 28-Oct-2016  jdolecek reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
succeed; change wapbl_register_deallocation() to return EAGAIN
rather than panic when code hits the limit

callers changed to either loop calling ffs_truncate() using new
utility ufs_truncate_retry() if their semantics requires it, or
just ignore the failure; remove ufs_wapbl_truncate()

this fixes possible user-triggerable panic during truncate, and
resolves WAPBL performance issue with truncates of large files

PR kern/47146 and kern/49175
 1.152 25-Sep-2016  jdolecek adjust ffs_realloccg() so that the logic about allocating full
contiguous block for future fragment expansion doesn't need to
UFS_WAPBL_REGISTER_DEALLOCATION() or ffs_blkfree(); the free blocks
are now immediatelly available for use by the expanding file in further i/o

primary driver is safe removal of the deallocation registration and
hence failure point, but this also fixes degenerate case for wapbl,
and similar also for discard - if the file would be actually expanded
before wapbl commit, or before discard queue would be processed,
the filesystem would not yet see the contiguous free blocks, and
would be forced to allocate another fragment elsewhere
 1.151 12-Aug-2015  riastradh branches: 1.151.2;
Need wapbl transaction around ffs_blkfree_cg. Fixes wapbl+discard.
 1.150 08-Aug-2015  mlelstv don't crash when printing error messages when there are no credentials.
don't abuse the printed uid to log the inode number.

The printing/logging of error messages should be simplified.
 1.149 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.148 17-Mar-2015  hannken Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.147 08-Sep-2014  joerg branches: 1.147.2;
Prefer cprng_fast32 over random. A good distribution even in the lower
bits beat any minor performance advantage randomo(9) might have,
especially given the disk IO involved.
 1.146 25-Jul-2014  dholland branches: 1.146.2;
Switch the FFS code for discarding free blocks to use VOP_FDISCARD.
 1.145 12-Nov-2013  dholland branches: 1.145.2;
clarify warning printout
 1.144 28-Oct-2013  bad Pull in fix from FreeBSD ffs_alloc.c r121785:
Consider only cylinder groups with at least 75% of the average free space
per cylinder group and 75% of the average free inodes per cylinder group
as candidates for the creation of a new directory. Avoids excessive I/O
scanning for a suitable cylinder group on relatively full file systems.

Tested by sborril and me.

Pullup: netbsd-6, netbsd-5


Original commit message:

Tweak the calculation of minbfree in ffs_dirpref() so that only
those cylinder groups that have at least 75% of the average free
space per cylinder group for that file system are considered as
candidates for the creation of a new directory. The previous formula
for minbfree would set it to zero if the file system was more than
75% full, which allowed cylinder groups with no free space at all
to be chosen as candidates for directory creation, which resulted
in an expensive search for free blocks for each file that was
subsequently created in that directory.

Modify the calculation of minifree in the same way.

Decrease maxcontigdirs as the file system fills to decrease the
likelyhood that a cluster of directories will overflow the available
space in a cylinder group.

Reviewed by: mckusick
Tested by: kmarx@vicor.com
MFC after: 2 weeks
 1.143 20-Oct-2013  christos always declare needswap
 1.142 20-Oct-2013  christos always declare needswap
 1.141 19-Oct-2013  martin Eliminate a variable only used in diagnostic kernels
 1.140 30-Sep-2013  hannken Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.139 12-Sep-2013  martin #ifdef a variable just like their use
 1.138 23-Jun-2013  dholland branches: 1.138.2;
Stick ffs_ in front of the following macros:
fragstoblks()
blkstofrags()
fragnum()
blknum()

to finish the job of distinguishing them from the lfs versions, which
Christos renamed the other day.

I believe this is the last of the overtly ambiguous exported symbols
from ffs... or at least, the last of the ones that conflicted with lfs.
ffs still pollutes the C namespace very broadly (as does ufs) and this
needs quite a bit more cleanup.

XXX: boo on macros with lowercase names. But I'm not tackling that just yet.
 1.137 23-Jun-2013  dholland Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.136 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.135 19-Jun-2013  dholland blkoff() -> ffs_blkoff() stragglers
 1.134 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.133 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.132 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.131 19-Oct-2012  drochner Implement experimental support to pass notifications that a file
was deleted from the filesystem to the disk driver, commonly
known as "discard" or "trim".
fs/driver support is in ffs and ata wd for now.
This is what was posted here:
http://mail-index.netbsd.org/tech-kern/2012/02/28/msg012813.html
with minor cleanup, and the global switch replaced by a mount option.
 1.130 28-Nov-2011  tls branches: 1.130.4; 1.130.8;
Remove arc4random() and arc4randbytes() from the kernel API. Replace
arc4random() hacks in rump with stubs that call the host arc4random() to
get numbers that are hopefully actually random (arc4random() keyed with
stack junk is not). This should fix some of the currently failing anita
tests -- we should no longer generate duplicate "random" MAC addresses in
the test environment.
 1.129 20-Sep-2011  chs branches: 1.129.2;
strengthen the assertions about pages existing during block allocation,
which were incorrectly relaxed last year. add some comments so that
the intent of these is hopefully clearer.

in ufs_balloc_range(), don't free pages or mark them dirty if
allocating their backing store failed. this fixes PR 45369.
 1.128 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.127 06-Mar-2011  bouyer branches: 1.127.2;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.126 06-Mar-2011  rmind {ffs_nodealloccg,ext2fs_nodealloccg,ext2fs_mapsearch}: use XOR and ffs()
to find free bits in the inode and block bitmaps, instead of the loop.

Obtained from FreeBSD (changes by jhb).
 1.125 21-Feb-2010  mlelstv branches: 1.125.2; 1.125.4; 1.125.6;
For the UVM_PAGE_TRKOWN test do not require that the relevant pages
must exist.
 1.124 07-May-2009  elad branches: 1.124.2;
Introduce several actions/requests for authorizing file-system related
operations, specifically quota and block allocation from reserved space.

Modify ufs_quotactl() to accomodate passing "mp" earlier by vfs_busy()ing
it a little bit higher.

Mailing list reference:

http://mail-index.netbsd.org/tech-kern/2009/04/26/msg004936.html

Note that the umapfs request mentioned in this thread was NOT added as
there is still on-going discussion regarding the proper implementation.
 1.123 25-Apr-2009  sborrill Fix random 'filesystem full' messages by trapping a couple of 32-bit
overflow areas missed in rev 1.110 and switching cgbase().

Kudos to rump_ffs!
 1.122 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.121 22-Feb-2009  ad PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc

- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.

- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.

- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.

- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.

- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.

- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.120 11-Jan-2009  christos branches: 1.120.2;
merge christos-time_t
 1.119 06-Dec-2008  joerg Split ffs_freefile into a frontend for normal cylinder group and for
snapshot use. Adjust ffs_blkfree_common to get the fs instance passed
in, the original commit didn't account blocks in the snapshots
correctly. Assert that ffs_blkfree is used with the primary fs instance
and that ffs_checkfreefile is only used for snapshots. Move the bdwrite
from ffs_blkfree_common into the caller for symmetry. This creates a
redundant write of unmodified data for ffs_blkfree_snap if a double free
of a block happens.

Reviewed and tested by hannken@.
 1.118 01-Dec-2008  joerg Revert last. Conditionalize variables on FFS_EI.
 1.117 01-Dec-2008  cegger build fix: remove unused variables
 1.116 01-Dec-2008  joerg ffs_blkfree is used in two different ways. The normal usage is to free a
block in the cylinder groups of the filesystem. The other user is the
snapshot code, which wants to modify the copied cylinder groups. Use
different frontends to distinguish the cases in preparation for fine
grained locking for cylinder groups.
 1.115 30-Nov-2008  joerg Split ffs_blkalloc into a frontend that does inode based consistency
checks and a backend that just asserts them. Use the backend in
ffs_wapbl_abort_sync_metadata instead of faking an inode.
 1.114 06-Nov-2008  joerg Remove XXXUBC code for ffs_reallocblks, that has been conditionalized in
2002 and #if 0'ed in 2005. It would need a considerable amount of work
to bring back and obscures the more important block allocation.
 1.113 06-Aug-2008  hannken branches: 1.113.2; 1.113.4;
Do not call UFS_WAPBL_*() when ffs_freefile() is acting on a snapshot.

While here replace the test for VBLK with a convenience variable.
 1.112 31-Jul-2008  hannken Resolve a deadlock when fs_nodealloccg() initializes more inodes on
an UFS2 file system. With the current cylinder group buffer busy it
calls ffs_getblk(). This runs through copy-on-write and may need the
current cylinder group buffer to allocate a new block for the snapshot.

While here write the cylinder group buffer synchronously after
cg_initediblk was changed because fsck_ffs will trust it.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.111 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.110 11-Jul-2008  simonb Fix potential 32-bit overflow problem in the blockpref code.
mlelstv@ points out FreeBSD fixed the same thing a couple of years
ago - here's the commit message they used on rev 1.127:

Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.

Submitted by: Henry Whincup <henry@jot.to>
 1.109 04-Jun-2008  ad branches: 1.109.2; 1.109.4;
When setting DONE on the buffer, assert that there are no waiters in
biowait().
 1.108 03-Jun-2008  hannken ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.107 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.106 21-Jan-2008  pooka branches: 1.106.6; 1.106.8; 1.106.10; 1.106.12; 1.106.14;
Sprinkle comments about um_lock status on function entry and exit.
No functional change.
 1.105 02-Jan-2008  ad Merge vmlocking2 to head.
 1.104 01-Nov-2007  hannken branches: 1.104.2; 1.104.4; 1.104.8;
Avoid doing bawrite to initialize inode block while holding cylinder
group block buffer busy. If filesystem has any active snapshots, bawrite
can come back trying to allocate new snapshot data block from the same
cylinder group and cause deadlock.

From FreeBSD Rev. 1.117
 1.103 18-Oct-2007  hannken Ffs_blkfree() and ffs_freefile() take a devvp that may be a regular file whencalled from snapshot creation. Be sure to use the right mount.

Ok: Andrew Doran <ad@netbsd.org>
 1.102 10-Oct-2007  ad branches: 1.102.2;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.101 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.100 09-Aug-2007  hannken branches: 1.100.2; 1.100.4;
Move snapshot per-mount data from struct ufsmount to mount specific data.
No functional changes.

Welcome to 4.99.28 (struct ufsmount changed size)
 1.99 16-Jul-2007  pooka branches: 1.99.2; 1.99.6;
When allocating blocks, check minfree before asking kauth about
suser. The latter has unknown cost and rarely needs to be called.
 1.98 04-Mar-2007  christos branches: 1.98.2; 1.98.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.97 04-Jan-2007  elad branches: 1.97.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.96 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.95 15-Oct-2006  yamt ffs_alloc: remove an assertion which is no longer true.
 1.94 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.93 23-Jun-2006  yamt branches: 1.93.4; 1.93.6;
fix a simonb-timecounters regression.
the precision of getnanotime() is not suitable for file timestamps.
esp. when it's nfs-exported.

- introduce vfs_timestamp().
(the name is from freebsd. currently merely a wrapper of nanotime())
- for ufs-like filesystems, use it rather than getnanotime().

XXX check other filesystems.
 1.92 07-Jun-2006  kardel branches: 1.92.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.91 14-May-2006  elad branches: 1.91.2;
integrate kauth.
 1.90 23-Dec-2005  yamt branches: 1.90.4; 1.90.6; 1.90.8; 1.90.10; 1.90.12;
prevent in-core vnode being freed from getting new references.
otherwise, once the corresponding bit in the inode bitmap is cleared,
an unrelated inode with the same inode number can be allocated and
ufs_ihashget() picks a stale in-core vnode for it.

PR/32301 by Matthias Scheler.
 1.89 27-Nov-2005  dsl Force some multiplies to give a 64 bit result to avoid dirsize being zero
and causing a divide by zero trap later.
Fixes a panic noted in netbsd-help.
 1.88 02-Nov-2005  yamt branches: 1.88.2;
merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.87 26-Sep-2005  yamt branches: 1.87.2;
always use nanotime rather than time.
it's bad to mix nanotime and time because it sometimes
make timestamps go backwards.
 1.86 19-Aug-2005  christos 64 bit inode changes.
 1.85 15-Jul-2005  thorpej Use ANSI function decls.
 1.84 06-Jun-2005  dbj branches: 1.84.2;
remove (long) cast on bpref, which is daddr_t
 1.83 29-May-2005  christos - sprinkle const
- avoid shadow variables.
 1.82 22-May-2005  hannken ffs/ffs_alloc.c:
- Add a missing ACTIVECG_CLR().

ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
newest snapshot gets removed. Fixes a rare snapshot corruption when using
more than one snapshot on a file system.

ufs/ufsmount.h:
- Make TAILQ_LAST() possible on member um_snapshots.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
 1.81 26-Feb-2005  perry branches: 1.81.2;
nuke trailing whitespace
 1.80 15-Dec-2004  mycroft branches: 1.80.2; 1.80.4;
Remove some unnecessary (int32_t) casts that would cause us to screw up the
top bit in block addresses.

Also, change some daddr_t->int32_t casts (mostly as arguments to ufs_rw32(),
where they would get promoted anyway) to u_int32_t.
 1.79 11-Oct-2004  dbj print absolute inode number in debug output when freeing free inode occurs.
previously, the number was relative to the cylinder group, which was confusing.
prefix debug message with "ifree:" so this can be differentiated in bug reports.
 1.78 29-Aug-2004  hannken While creating a snapshot inodes must be freed from the
snapshot, not from the file system.
ffs_freefile() needs explicit "fs" and "devvp" arguments.
 1.77 26-May-2004  hannken Don't use VTOI(vp)->i_flags to test for snapshot devices. Will not work
for non-UFS file systems. Test for VBLK vnode instead.
 1.76 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.75 18-Apr-2004  dbj when enabling ffs compatibility in ffs_reload, use
sblockloc that superblock was read from
also note XXX that ffs_reload doesn't handle superblock moving
 1.74 13-Jan-2004  soren branches: 1.74.2;
With large average filesizes, it was possible to overflow dirsize to zero,
causing a division by zero in ffs_dirpref().

From Barry Bouwsma of Tiengen.
 1.73 09-Jan-2004  dbj never upgrade the superblock or set FS_FLAGS_UPDATED in fs_old_flags
add compatibility for filesystems created before FFSv2 integration
these patches are from pr port-macppc/23926 and should also fix
problems discussed in pr kern/21404 and pr kern/21283
 1.72 30-Dec-2003  pk Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
 1.71 27-Nov-2003  mycroft Remove part of previous -- there is NO reason for directory allocation to use
arc4random().
 1.70 05-Sep-2003  itojun use arc4random instead of random (mask with INT32_MAX to avoid getting
negative numbers unexpectedly).
 1.69 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.68 29-Jun-2003  fvdl branches: 1.68.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.67 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.66 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.65 15-May-2003  kristerw The C language does not permit statements of the form
(X ? Y : Z) = 0;
even though gcc handles this by a stupid extension.

Transform these to correct C.

Approved by fvdl.
 1.64 04-May-2003  gmcgarry Print pid on error. From Greg A. Woods in PR#17393.
 1.63 17-Apr-2003  fvdl configdirs was changed to an array of u_int8_t, so don't compare values
to 65535.
 1.62 12-Apr-2003  fvdl Use variables for some cg accesses; makes things more readable and more
similar to FreeBSD. No functional change.
 1.61 10-Apr-2003  fvdl Initialize the 'mirror' i_flags fiels in struct inode to 0.
 1.60 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.59 26-Jan-2003  tsutsui More printf format cleanup to reduce casts.
 1.58 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.57 27-Dec-2002  hannken Clear IN_SPACECOUNTED on (re-)used inodes.
This cures the "unmount pending error:" on softdep umounts.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.56 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.55 14-May-2002  matt branches: 1.55.4;
Commit out code that's no longer used.
 1.54 10-Apr-2002  mycroft Use blkstofrags() and fragstoblks(). Use &(NBBY-1) rather than %NBBY.
Switch off of fs_fragshift rather than fs_frag (generates better jump tables).
 1.53 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.52 19-Sep-2001  lukem branches: 1.52.2;
- ffs_blkpref() changes:
- don't both updating fs->fs_cgrotor, since it's actually not used in
the kernel. from Manuel Bouyer in [kern/3389]
- when examining cylinder groups from startcg to startcg-1 (wrapping
at fs->fs_ncg), there's no need to check startcg at the end as well
as the start...
- highlight in the struct fs declaration that fs_cgrotor is UNUSED
 1.51 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.50 06-Sep-2001  lukem branches: 1.50.2;
Incorporate the enhanced ffs_dirpref() by Grigoriy Orlov, as found in
FreeBSD (three commits; the initial work, man page updates, and a fix
to ffs_reload()), with the following differences:
- Be consistent between newfs(8) and tunefs(8) as to the options which
set and control the tuning parameters for this work (avgfilesize & avgfpdir)
- Use u_int16_t instead of u_int8_t to keep track of the number of
contiguous directories (suggested by Chuck Silvers)
- Work within our FFS_EI framework
- Ensure that fs->fs_maxclusters and fs->fs_contigdirs don't point to
the same area of memory

The new algorithm has a marked performance increase, especially when
performing tasks such as untarring pkgsrc.tar.gz, etc.

The original FreeBSD commit messages are attached:

=====
mckusick 2001/04/10 01:39:00 PDT
Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>.
His description of the problem and solution follow. My own tests show
speedups on typical filesystem intensive workloads of 5% to 12% which
is very impressive considering the small amount of code change involved.

------

One day I noticed that some file operations run much faster on
small file systems then on big ones. I've looked at the ffs
algorithms, thought about them, and redesigned the dirpref algorithm.

First I want to describe the results of my tests. These results are old
and I have improved the algorithm after these tests were done. Nevertheless
they show how big the perfomance speedup may be. I have done two file/directory
intensive tests on a two OpenBSD systems with old and new dirpref algorithm.
The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports".
The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release.
It contains 6596 directories and 13868 files. The test systems are:

1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for
test is at wd1. Size of test file system is 8 Gb, number of cg=991,
size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current
from Dec 2000 with BUFCACHEPERCENT=35

2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system
at wd0, file system for test is at wd1. Size of test file system is 40 Gb,
number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k
OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50

You can get more info about the test systems and methods at:
http://www.ptci.ru/gluk/dirpref/old/dirpref.html

Test Results

tar -xzf ports.tar.gz rm -rf ports
mode old dirpref new dirpref speedup old dirprefnew dirpref speedup
First system
normal 667 472 1.41 477 331 1.44
async 285 144 1.98 130 14 9.29
sync 768 616 1.25 477 334 1.43
softdep 413 252 1.64 241 38 6.34
Second system
normal 329 81 4.06 263.5 93.5 2.81
async 302 25.7 11.75 112 2.26 49.56
sync 281 57.0 4.93 263 90.5 2.9
softdep 341 40.6 8.4 284 4.76 59.66

"old dirpref" and "new dirpref" columns give a test time in seconds.
speedup - speed increasement in times, ie. old dirpref / new dirpref.

------

Algorithm description

The old dirpref algorithm is described in comments:

/*
* Find a cylinder to place a directory.
*
* The policy implemented by this algorithm is to select from
* among those cylinder groups with above the average number of
* free inodes, the one with the smallest number of directories.
*/

A new directory is allocated in a different cylinder groups than its
parent directory resulting in a directory tree that is spreaded across
all the cylinder groups. This spreading out results in a non-optimal
access to the directories and files. When we have a small filesystem
it is not a problem but when the filesystem is big then perfomance
degradation becomes very apparent.

What I mean by a big file system ?

1. A big filesystem is a filesystem which occupy 20-30 or more percent
of total drive space, i.e. first and last cylinder are physically
located relatively far from each other.
2. It has a relatively large number of cylinder groups, for example
more cylinder groups than 50% of the buffers in the buffer cache.

The first results in long access times, while the second results in
many buffers being used by metadata operations. Such operations use
cylinder group blocks and on-disk inode blocks. The cylinder group
block (fs->fs_cblkno) contains struct cg, inode and block bit maps.
It is 2k in size for the default filesystem parameters. If new and
parent directories are located in different cylinder groups then the
system performs more input/output operations and uses more buffers.
On filesystems with many cylinder groups, lots of cache buffers are
used for metadata operations.

My solution for this problem is very simple. I allocate many directories
in one cylinder group. I also do some things, so that the new allocation
method does not cause excessive fragmentation and all directory inodes
will not be located at a location far from its file's inodes and data.
The algorithm is:
/*
* Find a cylinder group to place a directory.
*
* The policy implemented by this algorithm is to allocate a
* directory inode in the same cylinder group as its parent
* directory, but also to reserve space for its files inodes
* and data. Restrict the number of directories which may be
* allocated one after another in the same cylinder group
* without intervening allocation of files.
*
* If we allocate a first level directory then force allocation
* in another cylinder group.
*/

My early versions of dirpref give me a good results for a wide range of
file operations and different filesystem capacities except one case:
those applications that create their entire directory structure first
and only later fill this structure with files.

My solution for such and similar cases is to limit a number of
directories which may be created one after another in the same cylinder
group without intervening file creations. For this purpose, I allocate
an array of counters at mount time. This array is linked to the superblock
fs->fs_contigdirs[cg]. Each time a directory is created the counter
increases and each time a file is created the counter decreases. A 60Gb
filesystem with 8mb/cg requires 10kb of memory for the counters array.

The maxcontigdirs is a maximum number of directories which may be created
without an intervening file creation. I found in my tests that the best
performance occurs when I restrict the number of directories in one cylinder
group such that all its files may be located in the same cylinder group.
There may be some deterioration in performance if all the file inodes
are in the same cylinder group as its containing directory, but their
data partially resides in a different cylinder group. The maxcontigdirs
value is calculated to try to prevent this condition. Since there is
no way to know how many files and directories will be allocated later
I added two optimization parameters in superblock/tunefs. They are:

int32_t fs_avgfilesize; /* expected average file size */
int32_t fs_avgfpdir; /* expected # of files per directory */

These parameters have reasonable defaults but may be tweeked for special
uses of a filesystem. They are only necessary in rare cases like better
tuning a filesystem being used to store a squid cache.

I have been using this algorithm for about 3 months. I have done
a lot of testing on filesystems with different capacities, average
filesize, average number of files per directory, and so on. I think
this algorithm has no negative impact on filesystem perfomance. It
works better than the default one in all cases. The new dirpref
will greatly improve untarring/removing/coping of big directories,
decrease load on cvs servers and much more. The new dirpref doesn't
speedup a compilation process, but also doesn't slow it down.

Obtained from: Grigoriy Orlov <gluk@ptci.ru>
=====

=====
iedowse 2001/04/23 17:37:17 PDT
Pre-dirpref versions of fsck may zero out the new superblock fields
fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause
panics if these fields were zeroed while a filesystem was mounted
read-only, and then remounted read-write.

Add code to ffs_reload() which copies the fs_contigdirs pointer
from the previous superblock, and reinitialises fs_avgf* if necessary.

Reviewed by: mckusick
=====

=====
nik 2001/04/10 03:36:44 PDT
Add information about the new options to newfs and tunefs which set the
expected average file size and number of files per directory. Could do
with some fleshing out.
=====
 1.49 31-Aug-2001  lukem no need to cast arg to lblktosize() any more
 1.48 30-Aug-2001  lukem be consistent when casting arg to lblktosize() in UVM_PAGE_TRKOWN debug code
 1.47 24-Aug-2001  wiz heirarchy -> hierarchy
 1.46 20-Aug-2001  wiz precede, not preceed.
 1.45 09-Aug-2001  lukem correctly cast arguments to scanc()
 1.44 03-Jun-2001  chs branches: 1.44.4;
fix an error case for quotas.
 1.43 30-May-2001  mrg use _KERNEL_OPT
 1.42 13-Mar-2001  sommerfeld Change ffs_dirpref() to pay attention to the amount of available free
space before deciding which cylinder group should contain a new directory
inode.

Fixes kern/11983; works around some, but not all, of the side effects
of kern/11989.

Tested by me for well over a month on my laptop; preliminary versions of
the fix were tested by Frank van der Linden and Herb Peyerl.
 1.41 05-Feb-2001  chs branches: 1.41.2;
add casts to an assertion in ffs_alloc() so it works with offsets past 4GB.
 1.40 18-Jan-2001  jdolecek constify
 1.39 30-Nov-2000  nathanw Don't set the value of doreallocblks here; it's defined over in vfs_cluster.c
In fact, doreallocblks isn't used here at all. Delete the declaration.
 1.38 30-Nov-2000  jdolecek change vfs.ffs.doreallocblks to 1 by default - this does not have
aby bad symptoms any more, fix for bug causing problems with this
option was in BSD4.4-Lite2 and pulled in together with softdep changes

See also Keith Smith & Margo Seltzer's paper on the topic at
http://www.eecs.harvard.edu/~keith/papers/realloc.ps.gz
 1.37 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.36 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.35 19-May-2000  thorpej branches: 1.35.4;
NULL != 0
 1.34 04-Apr-2000  jdolecek Add a new sysctl variable vfs.ffs.log_changeopt - if this is true,
an optimalization strategy change is logged into syslog. Default
is 0 (to not log). This replaces the recent not quite "right"
change to only log the change if kernel is compiled with DEBUG.
 1.33 30-Mar-2000  augustss Remove register declarations.
 1.32 29-Mar-2000  jdolecek Log the optimization changes only if DEBUG. Fixes kern/9697
 1.31 14-Feb-2000  fvdl Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.30 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.29 24-Mar-1999  mrg branches: 1.29.4; 1.29.8; 1.29.10; 1.29.14;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.28 05-Mar-1999  mycroft Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.27 12-Nov-1998  thorpej defopt FFS_EI
 1.26 18-Aug-1998  thorpej branches: 1.26.2;
Back out part of last change (uninitialized work-around).
 1.25 18-Aug-1998  thorpej Add some braces to make egcs happy (ambiguous else warning). Also,
deal with bogus uninitialized warning (__noreturn__ related)
 1.24 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.23 28-Jul-1998  drochner The fragtbl[], inside[] and around[] variables are needed by "fsck",
so we can't put them inside "#ifdef _KERNEL".
Put declarations inside .c files where needed to preserve namespace.
 1.22 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.21 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.20 19-Mar-1998  ross Fix a 64-bit pointer/int warning.
 1.19 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.18 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.17 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.16 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.15 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.14 10-Mar-1997  mycroft Just increment the generation count. Using the time is bogus and defeats
fsirand(8).
 1.13 12-Oct-1996  christos branches: 1.13.6;
revert previous kprintf changes
 1.12 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.11 11-May-1996  mycroft Change VOP_UPDATE() semantics:
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
 1.10 17-Mar-1996  christos Fix printf format strings
 1.9 09-Feb-1996  christos ffs prototypes
 1.8 19-Jul-1995  cgd don't just throw away updates to the cylinder group bitmaps, actually
write them to disk! From Keith Smith at Harvard, via Kirk McKusick.
fixes the occasional `blkfree: freeing free block' that has been seen
when cluster reallocation code is enabled.
 1.7 24-Mar-1995  cgd explicitly cast &time to (struct timeval *) when passing it to VOP_UPDATE.
new prototypes and picky compilers make a volatile mess.
 1.6 16-Dec-1994  mycroft Ignore rotational optimization if nrpos == 1, as suggested by Stefan Esser.
 1.5 14-Dec-1994  mycroft Sync with CSRG.
 1.4 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.3 04-Jul-1994  mycroft Do the doasyncfree conditionalization better.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.3 01-Mar-1998  fvdl Import some files that were changed after Lite2
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.13.6.1 12-Mar-1997  is Merge in changes from Trunk
 1.26.2.4 30-May-1999  chs uvm_vnp_setpageblknos() is gone, and some misc cleanup.
 1.26.2.3 09-Apr-1999  chs in ffs_reallocg(), don't dereference bpp if it's NULL.
 1.26.2.2 25-Feb-1999  chs replace uvm_vnp_relocate() with uvm_vnp_setpageblknos().
 1.26.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.29.14.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.29.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.29.10.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.29.8.4 27-Mar-2001  bouyer Sync with HEAD.
 1.29.8.3 11-Feb-2001  bouyer Sync with HEAD.
 1.29.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.29.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.29.4.2 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.29.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.35.4.4 25-Nov-2001  he Pull up revision 1.52 (requested by lukem):
Mark fs_cgrotor as unused.
 1.35.4.3 25-Nov-2001  he Pull up revision 1.50 (requested by lukem):
Pull in enhanced ffs_dirpref() algorithm, which provides a
substantial performance improvement through better locality
between parent/child directories and their files, and by easing
the pressure on the buffer cache for metadata operations.
 1.35.4.2 25-Nov-2001  he Pull up revision 1.45 (requested by lukem):
Fix scanc() arguments.
 1.35.4.1 25-Nov-2001  he Pull up revision 1.42 (requested by lukem):
Change ffs_dirpref() to be less pathological.
 1.41.2.12 29-Dec-2002  thorpej Sync with HEAD.
 1.41.2.11 18-Oct-2002  nathanw Catch up to -current.
 1.41.2.10 15-Jul-2002  nathanw Revert to curproc.
 1.41.2.9 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.41.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.41.2.7 17-Apr-2002  nathanw Catch up to -current.
 1.41.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.41.2.5 21-Sep-2001  nathanw Catch up to -current.
 1.41.2.4 24-Aug-2001  nathanw Catch up with -current.
 1.41.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.41.2.2 09-Apr-2001  nathanw Catch up with -current.
 1.41.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.44.4.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.44.4.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.44.4.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.44.4.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.44.4.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.50.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.52.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.55.4.1 05-Jan-2003  jmc Pull up revisions 1.56-1.57 (requested by hannken in ticket #1049)
Clear IN_SPACECOUNTED on (re-)used inodes.
This cures the "unmount pending error:" on softdep umounts.
 1.68.2.11 11-Dec-2005  christos Sync with head.
 1.68.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.68.2.9 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.68.2.8 18-Dec-2004  skrll Sync with HEAD.
 1.68.2.7 19-Oct-2004  skrll Sync with HEAD
 1.68.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.68.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.68.2.4 03-Sep-2004  skrll Sync with HEAD
 1.68.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.68.2.2 03-Aug-2004  skrll Sync with HEAD
 1.68.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.74.2.1 27-Apr-2004  jdc Pull up revision 1.75 (requested by dbj in ticket #185)

Fix problems related to superblock upgrade issues which may be
experienced by -current users from 2003.
 1.80.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.80.2.1 29-Apr-2005  kent sync with -current
 1.81.2.1 28-May-2005  tron Pull up revision 1.82 (requested by hannken in ticket #334):
ffs/ffs_alloc.c:
- Add a missing ACTIVECG_CLR().
ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
newest snapshot gets removed. Fixes a rare snapshot corruption when using
more than one snapshot on a file system.
ufs/ufsmount.h:
- Make TAILQ_LAST() possible on member um_snapshots.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
 1.84.2.8 04-Feb-2008  yamt sync with head.
 1.84.2.7 21-Jan-2008  yamt sync with head
 1.84.2.6 15-Nov-2007  yamt sync with head.
 1.84.2.5 27-Oct-2007  yamt sync with head.
 1.84.2.4 03-Sep-2007  yamt sync with head.
 1.84.2.3 26-Feb-2007  yamt sync with head.
 1.84.2.2 30-Dec-2006  yamt sync with head.
 1.84.2.1 21-Jun-2006  yamt sync with head.
 1.87.2.2 29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.87.2.1 20-Oct-2005  yamt adapt ufs.
 1.88.2.1 29-Nov-2005  yamt sync with head.
 1.90.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.90.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.90.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.90.8.2 26-Jun-2006  yamt sync with head.
 1.90.8.1 24-May-2006  yamt sync with head.
 1.90.6.2 01-Jun-2006  kardel Sync with head.
 1.90.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.90.4.1 09-Sep-2006  rpaulo sync with head
 1.91.2.1 19-Jun-2006  chap Sync with head.
 1.92.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.93.6.2 10-Dec-2006  yamt sync with head.
 1.93.6.1 22-Oct-2006  yamt sync with head
 1.93.4.2 12-Jan-2007  ad Sync with head.
 1.93.4.1 18-Nov-2006  ad Sync with head.
 1.97.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.98.6.1 19-Mar-2007  reinoud Move the structure `cluster_save' to the dead ufs/ffs code that was using
it solely.

Preserved just in case the code is resurrected one day.
 1.98.2.8 23-Oct-2007  ad Sync with head.
 1.98.2.7 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.98.2.6 20-Aug-2007  ad Sync with HEAD.
 1.98.2.5 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.98.2.4 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.98.2.3 06-May-2007  ad ffs_blkfree: don't leak ump->um_lock.
 1.98.2.2 13-Apr-2007  ad Put a per-mount lock around ffs shared data structures, excluding softdep
and quotas. Strategy lifted from FreeBSD.
 1.98.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.99.6.3 04-Nov-2007  jmcneill Sync with HEAD.
 1.99.6.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.99.6.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.99.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.100.4.1 14-Oct-2007  yamt sync with head.
 1.100.2.3 23-Mar-2008  matt sync with HEAD
 1.100.2.2 09-Jan-2008  matt sync with HEAD
 1.100.2.1 06-Nov-2007  matt sync with HEAD
 1.102.2.2 13-Nov-2007  bouyer Sync with HEAD
 1.102.2.1 25-Oct-2007  bouyer Sync with HEAD.
 1.104.8.2 23-Jan-2008  bouyer Sync with HEAD.
 1.104.8.1 02-Jan-2008  bouyer Sync with HEAD
 1.104.4.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.104.2.1 18-Feb-2008  mjf Sync with HEAD.
 1.106.14.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.106.14.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.106.12.3 11-Mar-2010  yamt sync with head
 1.106.12.2 16-May-2009  yamt sync with head
 1.106.12.1 04-May-2009  yamt sync with head.
 1.106.10.3 17-Jun-2008  yamt sync with head.
 1.106.10.2 04-Jun-2008  yamt sync with head
 1.106.10.1 18-May-2008  yamt sync with head.
 1.106.8.8 04-Jan-2009  christos fix diagnostic printfs.
 1.106.8.7 30-Dec-2008  christos fix dev_t printfs
 1.106.8.6 28-Dec-2008  christos deal with new printfs format inconsistencies.
 1.106.8.5 27-Dec-2008  christos merge with head.
 1.106.8.4 09-Nov-2008  christos merge with head.
 1.106.8.3 01-Nov-2008  christos catch up with changes in head.
 1.106.8.2 01-Nov-2008  christos Sync with head.
 1.106.8.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.106.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.106.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.106.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.106.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.109.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.109.4.1 19-Oct-2008  haad Sync with HEAD.
 1.109.2.4 28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.109.2.3 18-Jul-2008  simonb Sync with head.
 1.109.2.2 12-Jun-2008  martin License police
 1.109.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.113.4.3 29-Oct-2013  sborrill Pull up the following revisions(s) (requested by bad in ticket #1888):
sys/ufs/ffs/ffs_alloc.c: revision 1.144 via patch

Pull in fix from FreeBSD ffs_alloc.c r121785:
Consider only cylinder groups with at least 75% of the average free space
per cylinder group and 75% of the average free inodes per cylinder group
as candidates for the creation of a new directory. Avoids excessive I/O
scanning for a suitable cylinder group on relatively full file systems.
 1.113.4.2 07-May-2009  snj Pull up following revision(s) (requested by sborrill in ticket #726):
sys/ufs/ffs/ffs_alloc.c: revision 1.123 via patch
Fix random 'filesystem full' messages by trapping a couple of 32-bit
overflow areas missed in rev 1.110 and switching cgbase().
Kudos to rump_ffs!
 1.113.4.1 24-Feb-2009  snj branches: 1.113.4.1.2;
Pull up following revision(s) (requested by ad in ticket #490):
sys/kern/vfs_wapbl.c: revision 1.23
sys/miscfs/syncfs/sync_subr.c: revision 1.36
sys/miscfs/syncfs/sync_vnops.c: revision 1.26
sys/ufs/ffs/ffs_alloc.c: revision 1.121
sys/ufs/ffs/ffs_vfsops.c: revision 1.242
sys/ufs/ffs/ffs_vnops.c: revision 1.110
PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.113.4.1.2.1 07-May-2009  snj branches: 1.113.4.1.2.1.2;
Pull up following revision(s) (requested by sborrill in ticket #726):
sys/ufs/ffs/ffs_alloc.c: revision 1.123 via patch
Fix random 'filesystem full' messages by trapping a couple of 32-bit
overflow areas missed in rev 1.110 and switching cgbase().
Kudos to rump_ffs!
 1.113.4.1.2.1.2.1 21-Apr-2010  matt sync to netbsd-5
 1.113.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.113.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.113.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.120.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.124.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.125.6.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.125.4.1 06-Jun-2011  jruoho Sync with HEAD.
 1.125.2.2 21-Apr-2011  rmind sync with head
 1.125.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.127.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.129.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.129.2.3 23-Jan-2013  yamt sync with head
 1.129.2.2 30-Oct-2012  yamt sync with head
 1.129.2.1 17-Apr-2012  yamt sync with head
 1.130.8.5 03-Dec-2017  jdolecek update from HEAD
 1.130.8.4 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.130.8.3 23-Jun-2013  tls resync from head
 1.130.8.2 25-Feb-2013  tls resync with head
 1.130.8.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.130.4.1 29-Oct-2013  sborrill Pull up the following revisions(s) (requested by bad in ticket #978):
sys/ufs/ffs/ffs_alloc.c: revision 1.144

Pull in fix from FreeBSD ffs_alloc.c r121785:
Consider only cylinder groups with at least 75% of the average free space
per cylinder group and 75% of the average free inodes per cylinder group
as candidates for the creation of a new directory. Avoids excessive I/O
scanning for a suitable cylinder group on relatively full file systems.
 1.138.2.1 18-May-2014  rmind sync with head
 1.145.2.1 10-Aug-2014  tls Rebase.
 1.146.2.2 29-May-2019  martin Pull up following revision(s) (requested by kardel in ticket #1697):

sys/ufs/ffs/ffs_alloc.c: revision 1.164

PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.

Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:

Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.146.2.1 14-Aug-2015  msaitoh branches: 1.146.2.1.2; 1.146.2.1.6;
Pull up following revision(s) (requested by riastradh in ticket #949):
sys/ufs/ffs/ffs_alloc.c: revision 1.151
Need wapbl transaction around ffs_blkfree_cg. Fixes wapbl+discard.
 1.146.2.1.6.1 29-May-2019  martin Pull up following revision(s) (requested by kardel in ticket #1697):

sys/ufs/ffs/ffs_alloc.c: revision 1.164

PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.

Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:

Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.146.2.1.2.1 29-May-2019  martin Pull up following revision(s) (requested by kardel in ticket #1697):

sys/ufs/ffs/ffs_alloc.c: revision 1.164

PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.

Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:

Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.147.2.5 28-Aug-2017  skrll Sync with HEAD
 1.147.2.4 05-Dec-2016  skrll Sync with HEAD
 1.147.2.3 05-Oct-2016  skrll Sync with HEAD
 1.147.2.2 22-Sep-2015  skrll Sync with HEAD
 1.147.2.1 06-Apr-2015  skrll Sync with HEAD
 1.151.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.151.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.154.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.156.6.2 29-May-2019  martin Pull up following revision(s) (requested by kardel in ticket #1272):

sys/ufs/ffs/ffs_alloc.c: revision 1.164

PR/53990, PR/52380, PR/52102: UFS2 cylinder group inode allocation botch

Fix rare allocation botch in ffs_nodealloccg().

Conditions:
a) less than
#_of_initialized_inodes(cg->cg_initediblk)
- inodes_per_filesystem_block
are allocated in the cylinder group
b) cg->cg_irotor points to a uninterupted run of
allocated inodes in the inode bitmap up to the
end of dynamically initialized inodes
(cg->cg_initediblk)

In this case the next inode after this run was returned
without initializing the respective inode block. As the
block is not initialized these inodes could trigger panics
on inode consistency due to old (uninitialized) disk data.

In very rare cases data loss could occur when
the uninitialized inode block is initialized via the
normal mechanism.

Further conditions to occur after the above:
c) no panic
d) no (forced) fsck
e) and more than cg->cg_initediblk - inodes_per_filesystem_block
allocated inodes.

Fix:

Always insure allocation always in initialized inode range
extending the initialized inode range as needed.

Add KASSERTMSG() safeguards.

ok hannken@
 1.156.6.1 24-Jul-2017  snj Pull up following revision(s) (requested by hannken in ticket #129):
sys/ufs/ffs/ffs_alloc.c: revision 1.157
When initializing more inodes make sure to write them to disk
before writing the cylinder group with updated cg_initediblk.
 1.159.4.3 21-Apr-2020  martin Sync with HEAD
 1.159.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.159.4.1 10-Jun-2019  christos Sync with HEAD
 1.159.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.159.2.1 28-Jul-2018  pgoyette Sync with HEAD
 1.164.6.1 29-Feb-2020  ad Sync with head.
 1.164.4.1 21-Mar-2020  martin Pull up following revision(s) (requested by riastradh in ticket #790):

sys/ufs/ffs/ffs_alloc.c: revision 1.165

Fix non-DIAGNOSTIC build with UVM_PAGE_TRKOWN.
 1.166.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.171.4.1 13-May-2023  martin Pull up following revision(s) (requested by chs in ticket #160):

usr.sbin/makefs/ffs/ffs_alloc.c: revision 1.31
sbin/tunefs/tunefs.c: revision 1.58
sbin/fsck_ffs/setup.c: revision 1.105
sbin/fsck_ffs/pass5.c: revision 1.56
usr.sbin/makefs/ffs.c: revision 1.74
usr.sbin/makefs/ffs/mkfs.c: revision 1.42
usr.sbin/makefs/Makefile: revision 1.40
sys/ufs/ffs/fs.h: revision 1.71
sbin/fsdb/fsdb.c: revision 1.54
sbin/resize_ffs/resize_ffs.c: revision 1.58
sbin/fsck_ffs/pass4.c: revision 1.29
usr.sbin/makefs/ffs/ffs_extern.h: revision 1.9
sbin/newfs/mkfs.c: revision 1.133
sys/ufs/ffs/ffs_alloc.c: revision 1.172
sbin/fsck_ffs/pass1b.c: revision 1.24
usr.sbin/dumpfs/dumpfs.c: revision 1.68
sys/ufs/ffs/ffs_extern.h: revision 1.88
usr.sbin/quotacheck/quotacheck.c: revision 1.51
sys/ufs/ffs/ffs_subr.c: revision 1.54
sbin/fsck_ffs/main.c: revision 1.91
sbin/fsck_ffs/pass1.c: revision 1.63

ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:
commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000
This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.
To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.
Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000
One last pass to get all the unsigned comparisons correct.

In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.173.2.1 02-Aug-2025  perseant Sync with HEAD
 1.15 15-Feb-2015  maxv Revert a change in my previous commit that broke the checksum calculation.
Noted by dholland@
 1.14 14-Feb-2015  maxv ffs_appleufs_validate():
- remove superfluous printfs
- ensure ul_namelen!=0, otherwise the kernel accesses ul_name[-1] and
overwrites the previous field in the structure.
 1.13 14-Feb-2015  maxv KNF. No functional change.
 1.12 19-Nov-2011  tls branches: 1.12.8; 1.12.26;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.11 22-Jun-2011  mrg branches: 1.11.2;
fix an off by one array overflow found by GCC 4.5.3.
 1.10 24-Apr-2010  dbj switch from 4 clause to 2 clause BSD license.
 1.9 11-Jun-2006  kardel branches: 1.9.60; 1.9.82; 1.9.84;
PR 33697: complete timecounter conversion
 1.8 11-Dec-2005  christos branches: 1.8.4; 1.8.8; 1.8.14;
merge ktrace-lwp.
 1.7 15-Jul-2005  thorpej Use ANSI function decls.
 1.6 26-Feb-2005  perry branches: 1.6.4;
nuke trailing whitespace
 1.5 02-Jan-2004  dbj branches: 1.5.8; 1.5.10;
explicitly pad struct appleufslabel and use __attribute__((__packed__))
since apple put the 64 bit uuid field on a 4 byte boundary
 1.4 02-Jan-2004  dbj add uuid field to apple ufs volume label
 1.3 13-Oct-2003  thorpej Whitespace nits.
 1.2 02-Nov-2002  dbj branches: 1.2.6;
use be32toh instead of ntohl, etc.
 1.1 28-Sep-2002  dbj branches: 1.1.2; 1.1.4;
Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.1.4.3 11-Nov-2002  nathanw Catch up to -current
 1.1.4.2 18-Oct-2002  nathanw Catch up to -current.
 1.1.4.1 28-Sep-2002  nathanw file ffs_appleufs.c was added on branch nathanw_sa on 2002-10-18 02:45:48 +0000
 1.1.2.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.1.2.1 28-Sep-2002  jdolecek file ffs_appleufs.c was added on branch kqueue on 2002-10-10 18:44:52 +0000
 1.2.6.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.2.6.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.2.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.2.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.2.6.1 03-Aug-2004  skrll Sync with HEAD
 1.5.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.5.8.1 29-Apr-2005  kent sync with -current
 1.6.4.1 21-Jun-2006  yamt sync with head.
 1.8.14.1 19-Jun-2006  chap Sync with head.
 1.8.8.1 26-Jun-2006  yamt sync with head.
 1.8.4.1 09-Sep-2006  rpaulo sync with head
 1.9.84.1 30-May-2010  rmind sync with head
 1.9.82.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.9.60.1 11-Aug-2010  yamt sync with head.
 1.11.2.1 17-Apr-2012  yamt sync with head
 1.12.26.1 06-Apr-2015  skrll Sync with HEAD
 1.12.8.1 03-Dec-2017  jdolecek update from HEAD
 1.66 17-Nov-2022  chs Restore backward compatibility of UFS2 with previous NetBSD releases by
disabling support in UFS2 for extended attributes (including ACLs).
Add a new variant of UFS2 called "UFS2ea" that does support extended attributes.
Add new fsck_ffs operations "-c ea" and "-c no-ea" to convert file systems
from UFS2 to UFS2ea and vice-versa (both of which delete all existing extended
attributes in the process).
 1.65 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.64 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.63 28-Oct-2017  pgoyette branches: 1.63.4; 1.63.14;
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...

(As proposed on tech-kern@ with additional changes and enhancements.)

Details of changes:

* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)

* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.

* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.

* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."

* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.

* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).

* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).

* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.

* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.

[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3) format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".

[2] I've tried very hard to find "all [the] existing users of kernhist(9)"
but it is possible that I've missed some of them. I would be glad to
update any stragglers that anyone identifies.
 1.62 25-Sep-2016  jdolecek branches: 1.62.8;
fix typo in #ifdef notyet part
 1.61 28-Mar-2015  maxv branches: 1.61.2;
Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.60 20-Oct-2013  htodd branches: 1.60.6;
Definining needswap where needed.
 1.59 23-Jun-2013  dholland branches: 1.59.2;
Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.58 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.57 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.56 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.55 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.54 23-Apr-2011  hannken branches: 1.54.4; 1.54.14;
Try to keep snapshot indirect blocks contiguous.

This speeds up snapshot creation by a factor of ~3 and reduces
the file system suspension time by a factor of ~5.
 1.53 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.52 22-Feb-2009  ad branches: 1.52.4; 1.52.6; 1.52.8;
PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.51 31-Jul-2008  simonb branches: 1.51.2; 1.51.4; 1.51.8;
Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.50 03-Jun-2008  hannken branches: 1.50.2; 1.50.4;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.49 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.48 02-Jan-2008  ad branches: 1.48.6; 1.48.8; 1.48.10; 1.48.12;
Merge vmlocking2 to head.
 1.47 08-Dec-2007  ad branches: 1.47.4;
Add some comments.
 1.46 08-Oct-2007  ad branches: 1.46.4; 1.46.6;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.45 29-Jun-2007  pooka branches: 1.45.6; 1.45.8; 1.45.10;
remove redundant KASSERTs
 1.44 29-Jan-2007  hubertf branches: 1.44.6; 1.44.8; 1.44.10;
Remove more duplicate headers.
Patch by Slava Semushin <slava.semushin@gmail.com>

Again, this was tested by comparing obj files from a pristine and a patched
source tree against an i386/ALL kernel, and also for src/sbin/fsck_ffs,
src/sbin/fsdb and src/usr.sbin/makefs. Only changes in assert() line numbers
were detected in 'objdump -d' output.
 1.43 14-May-2006  elad branches: 1.43.8;
integrate kauth.
 1.42 15-Apr-2006  christos Coverity CID 2858: Avoid NULL deref.
 1.41 23-Mar-2006  hannken ffs_balloc*(): Add an assertion for "bpp != NULL" if B_METAONLY is set.

From Coverity CIDs 1170..1173
 1.40 11-Dec-2005  christos branches: 1.40.4; 1.40.6; 1.40.8; 1.40.10; 1.40.12;
merge ktrace-lwp.
 1.39 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.38 15-Jul-2005  thorpej branches: 1.38.2;
Use ANSI function decls.
 1.37 15-Dec-2004  mycroft branches: 1.37.10;
Remove some unnecessary (int32_t) casts that would cause us to screw up the
top bit in block addresses.

Also, change some daddr_t->int32_t casts (mostly as arguments to ufs_rw32(),
where they would get promoted anyway) to u_int32_t.
 1.36 14-Aug-2004  mycroft In the indirect block unwind case, we only need to do the synchronous writes
of the inode in the softdep case. XXX This is really a deficiency in softdep.
 1.35 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.34 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.33 02-Apr-2003  fvdl branches: 1.33.2;
Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.32 15-Mar-2003  kristerw ffs_gop_alloc() is not used any more. Remove it.

OK:ed by Konrad Schroder.
 1.31 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.30 05-Jun-2002  chs get the units right when computing a blkno in the ENOSPC path
for allocations involving indirect blocks.
spotted by Trevin Beattie <trevin@xmission.com>.
 1.29 08-Nov-2001  chs branches: 1.29.8; 1.29.10;
the previous fix (in rev. 1.26) for hangs when the filesystem is full
was wrong, so fix it right this time. undo the previous change and
instead, replace the troublesome VOP_FSYNC()s with code that just flushes
the particular indirect blocks that we allocated. this resolves the
softdeps for those blocks. then we can change the pointer for
the first indirect block we allocated to zero, write that, and finally
invalidate all the indirect blocks we've touched. also, wait until
after we finish all this before freeing any blocks we allocated.
fixes PRs 14413 and 14423.
 1.28 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.27 30-Sep-2001  chs branches: 1.27.2;
in ffs_balloc(), clean up page cache state to avoid hangs when we
get ENOSPC. as a result of this, we now skip some of the normal cleanup
in ufs_balloc_range() in the error case.
 1.26 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.25 08-Aug-2001  lukem branches: 1.25.2;
get argument name correct in comment describing vop_balloc_args
 1.24 30-May-2001  mrg branches: 1.24.4;
use _KERNEL_OPT
 1.23 27-Nov-2000  chs branches: 1.23.2;
Initial integration of the Unified Buffer Cache project.
 1.22 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.

Implement range fsync for FFS. Note: not yet implemented for the
SOFTDEP case.
 1.21 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.20 29-May-2000  mycroft branches: 1.20.2;
MNT_WAIT -> FSYNC_WAIT
 1.19 28-May-2000  mycroft DTRT when unwinding multiple levels.
 1.18 28-May-2000  mycroft When unwinding a failed allocation, make sure to nuke the unwound block from
the vnode's block list. This fixes `itrunc3' panics (at least in some cases;
further testing is needed) and prevents further lossage later on.
 1.17 25-Feb-2000  fvdl branches: 1.17.2;
Fix a bug introduced in Lite2 with block allocation and full disk
conditions. Reported by Ian Dowse <iedowse@maths.tcd.ie>, based
on patch in FreeBSD reviewed by Kirk McKusick.
 1.16 14-Feb-2000  fvdl Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.15 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.14 24-Mar-1999  mrg branches: 1.14.4; 1.14.8; 1.14.10; 1.14.14;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.13 27-Oct-1998  mycroft branches: 1.13.2;
Do not corrupt file flags when file system is full!
 1.12 13-Jun-1998  kleink KNF, mostly of FFS_EI changes.
 1.11 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.10 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.9 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.8 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.7 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.6 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.5 04-Jul-1997  drochner Don't cast 64bit (off_t) file sizes to vm_offset_t (32bit on many
architectures), truncate them intelligently instead.
The truncation is done centralized in vnode_pager.c.
This prevents from wrap-over effects when parts of large (>2^32 byte) files
are mmapped.
Don't allow to mmap above the numerical range of vm_offset_t.
This is considered a temporary solution until the vm system handles the
object sizes/offsets more cleanly.
 1.4 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.3 09-Feb-1996  christos ffs prototypes
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.13.2.5 30-May-1999  chs in ffs_balloc(), remove the "alloced" flag I added. with the demise
of the vm_page blkno field this is no longer useful.
also be sure to return the blkno in all cases.
in ffs_balloc_range(), uvm_vnp_setpageblknos() is gone.
 1.13.2.4 29-Apr-1999  chs catch another case in ffs_balloc() where we need to set the aux return info.
adjust the file size in ffs_balloc_range() instead of ffs_write(),
the allocator routines need to have current info.
 1.13.2.3 09-Apr-1999  chs undo combining of two cases that were actually different.
 1.13.2.2 25-Feb-1999  chs add some args to ffs_balloc() to allow it to return the
physical blkno of the requested block and whether or not
the block was allocated by the current call.
move ffs_mballoc() here from ufs_readwrite.c and rename it
to ffs_balloc_range().
 1.13.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.14.14.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.14.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.14.10.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.14.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.14.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.4.6 06-Aug-1999  chs avoid setting u_size lower in ffs_balloc(), otherwise we'll end up
PG_RELEASEing pages we have busy in ufs_balloc_range().
 1.14.4.5 31-Jul-1999  chs adapt to new VOP_BALLOC() interface.
 1.14.4.4 11-Jul-1999  chs no need to call uvm_vnp_zerorange() in ffs_balloc() anymore,
it's handled differently now.
 1.14.4.3 06-Jul-1999  chs avoid creating pages beyond EOF.
 1.14.4.2 04-Jul-1999  chs convert ffs_balloc() to a VOP interface.
rename ffs_balloc_range() to ufs_balloc_range() in ufs_inode.c.
 1.14.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.17.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.20.2.1 14-Dec-2000  he Pull up revision 1.22 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.23.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.23.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.23.2.5 08-Oct-2001  nathanw Catch up to -current.
 1.23.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.23.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.23.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.23.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.24.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.24.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.24.4.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.25.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.27.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.29.10.1 05-Jun-2002  lukem Pull up revision 1.30 (requested by chuq in ticket #171):
get the units right when computing a blkno in the ENOSPC path
for allocations involving indirect blocks.
spotted by Trevin Beattie <trevin@xmission.com>.
 1.29.8.1 20-Jun-2002  gehenna catch up with -current.
 1.33.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.33.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.33.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.33.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.33.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.33.2.1 03-Aug-2004  skrll Sync with HEAD
 1.37.10.5 21-Jan-2008  yamt sync with head
 1.37.10.4 27-Oct-2007  yamt sync with head.
 1.37.10.3 03-Sep-2007  yamt sync with head.
 1.37.10.2 26-Feb-2007  yamt sync with head.
 1.37.10.1 21-Jun-2006  yamt sync with head.
 1.38.2.2 29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.38.2.1 20-Oct-2005  yamt adapt ufs.
 1.40.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.40.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.40.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.40.10.2 19-Apr-2006  elad sync with head.
 1.40.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.40.8.2 24-May-2006  yamt sync with head.
 1.40.8.1 01-Apr-2006  yamt sync with head.
 1.40.6.2 01-Jun-2006  kardel Sync with head.
 1.40.6.1 22-Apr-2006  simonb Sync with head.
 1.40.4.1 09-Sep-2006  rpaulo sync with head
 1.43.8.1 01-Feb-2007  ad Sync with head.
 1.44.10.1 09-Dec-2007  reinoud Pullup to HEAD
 1.44.8.1 11-Jul-2007  mjf Sync with head.
 1.44.6.6 24-Oct-2007  ad Comment out 'fix' for allocation failure with softdep. It would hang
because we can try to flush pages that we hold busy. Instead it now
crashes (matching what happens on HEAD).
 1.44.6.5 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.44.6.4 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.44.6.3 15-Jul-2007  ad Sync with head.
 1.44.6.2 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.44.6.1 13-Apr-2007  ad Put a per-mount lock around ffs shared data structures, excluding softdep
and quotas. Strategy lifted from FreeBSD.
 1.45.10.1 14-Oct-2007  yamt sync with head.
 1.45.8.2 09-Jan-2008  matt sync with HEAD
 1.45.8.1 06-Nov-2007  matt sync with HEAD
 1.45.6.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.45.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.46.6.2 08-Dec-2007  ad Sync with head.
 1.46.6.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.46.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.47.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.48.12.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.48.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.48.10.1 04-May-2009  yamt sync with head.
 1.48.8.2 04-Jun-2008  yamt sync with head
 1.48.8.1 18-May-2008  yamt sync with head.
 1.48.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.48.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.48.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.50.4.1 19-Oct-2008  haad Sync with HEAD.
 1.50.2.1 28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.51.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.51.4.1 18-Jun-2011  bouyer Pull up following revision(s) (requested by hannken in ticket #1627):
sys/kern/vfs_wapbl.c: revisions 1.41-1.42
sbin/dump/snapshot.c: revisions 1.6 (patch)
share/man/man4/fss.4: revisions 1.15 (patch)
sys/dev/fss.c: revisions 1.73 (patch)
sys/dev/fssvar.h: revisions 1.25
usr.sbin/fssconfig/fssconfig.c: revisions 1.7
sys/ufs/ffs/ffs_balloc.c: revisions 1.54
sys/ufs/ffs/ffs_snapshot.c: revisions 1.90, 1.98, 1.100-1.101, 1.103-1.110, 1.111, 1.112-1.115 (patch)

- Try to keep snapshot indirect blocks contiguous. This speeds up snapshot
creation by a factor of ~3 and reduces the file system suspension time by
a factor of ~5.

- Refine the scope of WAPBL transactions and the limit for deallocations in
one transaction so we should no longer get a "wapbl_flush: current
transaction too big to flush" panic when creating or removing snapshots
on larger logging disks.

- fss(4): Allow FSSIOCSET to set the initial flags. Add a new flag
"FSS_UNLINK_ON_CREATE" to unlink the backing store before the snapshot
gets created. With this change dump(8) no longer dumps the zero-sized,
but named snapshot it is working on.
 1.51.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.52.8.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.52.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.52.4.2 31-May-2011  rmind sync with head
 1.52.4.1 21-Apr-2011  rmind sync with head
 1.54.14.4 03-Dec-2017  jdolecek update from HEAD
 1.54.14.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.54.14.2 23-Jun-2013  tls resync from head
 1.54.14.1 25-Feb-2013  tls resync with head
 1.54.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.54.4.1 23-Jan-2013  yamt sync with head
 1.59.2.1 18-May-2014  rmind sync with head
 1.60.6.2 05-Oct-2016  skrll Sync with HEAD
 1.60.6.1 06-Apr-2015  skrll Sync with HEAD
 1.61.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.62.8.1 02-Nov-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #335):
share/man/man9/kernhist.9: 1.5-1.8
sys/arch/acorn26/acorn26/pmap.c: 1.39
sys/arch/arm/arm32/fault.c: 1.105 via patch
sys/arch/arm/arm32/pmap.c: 1.350, 1.359
sys/arch/arm/broadcom/bcm2835_bsc.c: 1.7
sys/arch/arm/omap/if_cpsw.c: 1.20
sys/arch/arm/omap/tiotg.c: 1.7
sys/arch/evbarm/conf/RPI2_INSTALL: 1.3
sys/dev/ic/sl811hs.c: 1.98
sys/dev/usb/ehci.c: 1.256
sys/dev/usb/if_axe.c: 1.83
sys/dev/usb/motg.c: 1.18
sys/dev/usb/ohci.c: 1.274
sys/dev/usb/ucom.c: 1.119
sys/dev/usb/uhci.c: 1.277
sys/dev/usb/uhub.c: 1.137
sys/dev/usb/umass.c: 1.160-1.162
sys/dev/usb/umass_quirks.c: 1.100
sys/dev/usb/umass_scsipi.c: 1.55
sys/dev/usb/usb.c: 1.168
sys/dev/usb/usb_mem.c: 1.70
sys/dev/usb/usb_subr.c: 1.221
sys/dev/usb/usbdi.c: 1.175
sys/dev/usb/usbdi_util.c: 1.67-1.70
sys/dev/usb/usbroothub.c: 1.3
sys/dev/usb/xhci.c: 1.75
sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: 1.34
sys/kern/kern_history.c: 1.15
sys/kern/kern_xxx.c: 1.74
sys/kern/vfs_bio.c: 1.275-1.276
sys/miscfs/genfs/genfs_io.c: 1.71
sys/sys/kernhist.h: 1.21
sys/ufs/ffs/ffs_balloc.c: 1.63
sys/ufs/lfs/lfs_vfsops.c: 1.361
sys/ufs/lfs/ulfs_inode.c: 1.21
sys/ufs/lfs/ulfs_vnops.c: 1.52
sys/ufs/ufs/ufs_inode.c: 1.102
sys/ufs/ufs/ufs_vnops.c: 1.239
sys/uvm/pmap/pmap.c: 1.37-1.39
sys/uvm/pmap/pmap_tlb.c: 1.22
sys/uvm/uvm_amap.c: 1.108
sys/uvm/uvm_anon.c: 1.64
sys/uvm/uvm_aobj.c: 1.126
sys/uvm/uvm_bio.c: 1.91
sys/uvm/uvm_device.c: 1.66
sys/uvm/uvm_fault.c: 1.201
sys/uvm/uvm_km.c: 1.144
sys/uvm/uvm_loan.c: 1.85
sys/uvm/uvm_map.c: 1.353
sys/uvm/uvm_page.c: 1.194
sys/uvm/uvm_pager.c: 1.111
sys/uvm/uvm_pdaemon.c: 1.109
sys/uvm/uvm_swap.c: 1.175
sys/uvm/uvm_vnode.c: 1.103
usr.bin/vmstat/vmstat.c: 1.219
Reorder to test for null before null deref in debug code
--
Reorder to test for null before null deref in debug code
--
KNF
--
No need for '\n' in UVMHIST_LOG
--
normalise a BIOHIST log message
--
Update the kernhist(9) kernel history code to address issues identified
in PR kern/52639, as well as some general cleaning-up...
(As proposed on tech-kern@ with additional changes and enhancements.)
Details of changes:
* All history arguments are now stored as uintmax_t values[1], both in
the kernel and in the structures used for exporting the history data
to userland via sysctl(9). This avoids problems on some architectures
where passing a 64-bit (or larger) value to printf(3) can cause it to
process the value as multiple arguments. (This can be particularly
problematic when printf()'s format string is not a literal, since in
that case the compiler cannot know how large each argument should be.)
* Update the data structures used for exporting kernel history data to
include a version number as well as the length of history arguments.
* All [2] existing users of kernhist(9) have had their format strings
updated. Each format specifier now includes an explicit length
modifier 'j' to refer to numeric values of the size of uintmax_t.
* All [2] existing users of kernhist(9) have had their format strings
updated to replace uses of "%p" with "%#jx", and the pointer
arguments are now cast to (uintptr_t) before being subsequently cast
to (uintmax_t). This is needed to avoid compiler warnings about
casting "pointer to integer of a different size."
* All [2] existing users of kernhist(9) have had instances of "%s" or
"%c" format strings replaced with numeric formats; several instances
of mis-match between format string and argument list have been fixed.
* vmstat(1) has been modified to handle the new size of arguments in the
history data as exported by sysctl(9).
* vmstat(1) now provides a warning message if the history requested with
the -u option does not exist (previously, this condition was silently
ignored, with only a single blank line being printed).
* vmstat(1) now checks the version and argument length included in the
data exported via sysctl(9) and exits if they do not match the values
with which vmstat was built.
* The kernhist(9) man-page has been updated to note the additional
requirements imposed on the format strings, along with several other
minor changes and enhancements.
[1] It would have been possible to use an explicit length (for example,
uint64_t) for the history arguments. But that would require another
"rototill" of all the users in the future when we add support for an
architecture that supports a larger size. Also, the printf(3)
format
specifiers for explicitly-sized values, such as "%"PRIu64, are much
more verbose (and less aesthetically appealing, IMHO) than simply
using "%ju".
[2] I've tried very hard to find "all [the] existing users of
kernhist(9)"
but it is possible that I've missed some of them. I would be glad
to
update any stragglers that anyone identifies.
--
For some reason this single kernel seems to have outgrown its declared
size as a result of the kernhist(9) changes. Bump the size.
XXX The amount of increase may be excessive - anyone with more detailed
XXX knowledge please feel free to further adjust the value
appropriately.
--
Misssed one cast of pointer --> uintptr_t in previous kernhist(9) commit
--
And yet another one. :(
--
Use correct mark-up for NetBSD version.
--
More improvements in grammar and readability.
--
Remove a stray '"' (obvious typo) and add a couple of casts that are
probably needed.
--
And replace an instance of "%p" conversion with "%#jx"
--
Whitespace fix. Give Bl tag table a width. Fix Xr.
 1.63.14.1 20-Apr-2020  bouyer Sync with HEAD
 1.63.4.1 21-Apr-2020  martin Sync with HEAD
 1.40 09-Feb-2017  kre Sprinkle in a pinch of const, not too much, just enough
to add a little strength without affecting the overall balance...
 1.39 20-May-2015  riastradh branches: 1.39.2; 1.39.4;
memcpy di_extb/db/ib separately. Noted by Coverity, CID 974636.
 1.38 20-May-2015  riastradh Don't (harmlessly) overrun di_db array; copy di_ib separately.

Noted by Coverity, CID 974635.

While here, simplify size calculation for memcpy.
 1.37 09-Jun-2013  dholland branches: 1.37.8; 1.37.10;
Remove lfs-only inumber field (and its supporting union) from struct
ufs1_dinode.
 1.36 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.35 06-Mar-2011  bouyer branches: 1.35.4; 1.35.14;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.34 19-Oct-2009  bouyer branches: 1.34.4; 1.34.6; 1.34.8;
Remove closes 3 & 4 from my licence. Lots of thanks to Soren Jacobsen
for the booring work !
 1.33 18-Jan-2009  lukem fix -Wsign-compare issues
 1.32 11-Dec-2005  christos branches: 1.32.74; 1.32.84;
merge ktrace-lwp.
 1.31 03-Jun-2005  dbj the cluster summary must be swapped even for ufs2
 1.30 02-Jun-2005  is fix copy/paste/don'tupdate bug (fix from PR 22232 by Robert Elz).
 1.29 26-Feb-2005  perry branches: 1.29.2;
nuke trailing whitespace
 1.28 25-May-2004  hannken branches: 1.28.4; 1.28.6;
Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.27 31-Dec-2003  dbj branches: 1.27.4;
remove incorrect XXX comments I introduced a couple of days ago
 1.26 31-Dec-2003  dbj remove unused cs_numclusters field from struct csum_total
this avoids a potential future bug if it is ever used.
before this fix, fsck_ffs would check and fix this field to be zero
 1.25 31-Dec-2003  dbj reorder ffs_sb_swap to reflect actual order in superblock
add comments regarding historical field overlap
no functional change
 1.24 31-Dec-2003  dbj add fs_flags to ffs_sb_swap
 1.23 30-Dec-2003  dbj fix bugs in ffs_cg_swap for FS_42POSTBLFMT
 1.22 27-Oct-2003  lukem Overhaul how `build.sh tools' are used:

* Rename "config.h" to "nbtool_config.h" and
HAVE_CONFIG_H to HAVE_NBTOOL_CONFIG_H.
This makes in more obvious in the source when we're using
tools/compat/config.h versus "standard autoconf" config.h

* Consistently move the inclusion of nbtool_config.h to before
<sys/cdefs.h> so that the former can provide __RCSID() (et al),
and there's no need to protect those macros any more.

These changes should make it easier to "tool-ify" a program by adding:
#if HAVE_NBTOOL_CONFIG_H
#include "nbtool_config.h"
#endif
to the top of the source files (for the general case).
 1.21 05-Oct-2003  bouyer Remove references to University of California from my copyright notices.
 1.20 16-Apr-2003  yamt branches: 1.20.2;
use bswap32 and bswap64 correctly.
(fs_pendingblocks and fs_pendinginodes)
 1.19 11-Apr-2003  enami Make ffs_cg_swap() works even if same chunk is passed as new and old cg.
This is necessary to prevent newfs from dumping core when it is asked to
create a UFS1 file system of non-native endian.
 1.18 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.17 31-Jan-2002  tv These sources are pulled into makefs(8), so we need config.h and protection
for __KERNEL_RCSID().
 1.16 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.15 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.14 29-Oct-2001  lukem ffs_sb_swap() fixes:
- calculate the offset and length of the postbl before byteswapping.
problem noted by der Mouse.
- use offsetof() to determine # of fields to calculate in initial
loop, rather than hard-coding in `52 fields'
- improve comments.
 1.13 06-Sep-2001  lukem branches: 1.13.4;
Incorporate the enhanced ffs_dirpref() by Grigoriy Orlov, as found in
FreeBSD (three commits; the initial work, man page updates, and a fix
to ffs_reload()), with the following differences:
- Be consistent between newfs(8) and tunefs(8) as to the options which
set and control the tuning parameters for this work (avgfilesize & avgfpdir)
- Use u_int16_t instead of u_int8_t to keep track of the number of
contiguous directories (suggested by Chuck Silvers)
- Work within our FFS_EI framework
- Ensure that fs->fs_maxclusters and fs->fs_contigdirs don't point to
the same area of memory

The new algorithm has a marked performance increase, especially when
performing tasks such as untarring pkgsrc.tar.gz, etc.

The original FreeBSD commit messages are attached:

=====
mckusick 2001/04/10 01:39:00 PDT
Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>.
His description of the problem and solution follow. My own tests show
speedups on typical filesystem intensive workloads of 5% to 12% which
is very impressive considering the small amount of code change involved.

------

One day I noticed that some file operations run much faster on
small file systems then on big ones. I've looked at the ffs
algorithms, thought about them, and redesigned the dirpref algorithm.

First I want to describe the results of my tests. These results are old
and I have improved the algorithm after these tests were done. Nevertheless
they show how big the perfomance speedup may be. I have done two file/directory
intensive tests on a two OpenBSD systems with old and new dirpref algorithm.
The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports".
The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release.
It contains 6596 directories and 13868 files. The test systems are:

1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for
test is at wd1. Size of test file system is 8 Gb, number of cg=991,
size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current
from Dec 2000 with BUFCACHEPERCENT=35

2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system
at wd0, file system for test is at wd1. Size of test file system is 40 Gb,
number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k
OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50

You can get more info about the test systems and methods at:
http://www.ptci.ru/gluk/dirpref/old/dirpref.html

Test Results

tar -xzf ports.tar.gz rm -rf ports
mode old dirpref new dirpref speedup old dirprefnew dirpref speedup
First system
normal 667 472 1.41 477 331 1.44
async 285 144 1.98 130 14 9.29
sync 768 616 1.25 477 334 1.43
softdep 413 252 1.64 241 38 6.34
Second system
normal 329 81 4.06 263.5 93.5 2.81
async 302 25.7 11.75 112 2.26 49.56
sync 281 57.0 4.93 263 90.5 2.9
softdep 341 40.6 8.4 284 4.76 59.66

"old dirpref" and "new dirpref" columns give a test time in seconds.
speedup - speed increasement in times, ie. old dirpref / new dirpref.

------

Algorithm description

The old dirpref algorithm is described in comments:

/*
* Find a cylinder to place a directory.
*
* The policy implemented by this algorithm is to select from
* among those cylinder groups with above the average number of
* free inodes, the one with the smallest number of directories.
*/

A new directory is allocated in a different cylinder groups than its
parent directory resulting in a directory tree that is spreaded across
all the cylinder groups. This spreading out results in a non-optimal
access to the directories and files. When we have a small filesystem
it is not a problem but when the filesystem is big then perfomance
degradation becomes very apparent.

What I mean by a big file system ?

1. A big filesystem is a filesystem which occupy 20-30 or more percent
of total drive space, i.e. first and last cylinder are physically
located relatively far from each other.
2. It has a relatively large number of cylinder groups, for example
more cylinder groups than 50% of the buffers in the buffer cache.

The first results in long access times, while the second results in
many buffers being used by metadata operations. Such operations use
cylinder group blocks and on-disk inode blocks. The cylinder group
block (fs->fs_cblkno) contains struct cg, inode and block bit maps.
It is 2k in size for the default filesystem parameters. If new and
parent directories are located in different cylinder groups then the
system performs more input/output operations and uses more buffers.
On filesystems with many cylinder groups, lots of cache buffers are
used for metadata operations.

My solution for this problem is very simple. I allocate many directories
in one cylinder group. I also do some things, so that the new allocation
method does not cause excessive fragmentation and all directory inodes
will not be located at a location far from its file's inodes and data.
The algorithm is:
/*
* Find a cylinder group to place a directory.
*
* The policy implemented by this algorithm is to allocate a
* directory inode in the same cylinder group as its parent
* directory, but also to reserve space for its files inodes
* and data. Restrict the number of directories which may be
* allocated one after another in the same cylinder group
* without intervening allocation of files.
*
* If we allocate a first level directory then force allocation
* in another cylinder group.
*/

My early versions of dirpref give me a good results for a wide range of
file operations and different filesystem capacities except one case:
those applications that create their entire directory structure first
and only later fill this structure with files.

My solution for such and similar cases is to limit a number of
directories which may be created one after another in the same cylinder
group without intervening file creations. For this purpose, I allocate
an array of counters at mount time. This array is linked to the superblock
fs->fs_contigdirs[cg]. Each time a directory is created the counter
increases and each time a file is created the counter decreases. A 60Gb
filesystem with 8mb/cg requires 10kb of memory for the counters array.

The maxcontigdirs is a maximum number of directories which may be created
without an intervening file creation. I found in my tests that the best
performance occurs when I restrict the number of directories in one cylinder
group such that all its files may be located in the same cylinder group.
There may be some deterioration in performance if all the file inodes
are in the same cylinder group as its containing directory, but their
data partially resides in a different cylinder group. The maxcontigdirs
value is calculated to try to prevent this condition. Since there is
no way to know how many files and directories will be allocated later
I added two optimization parameters in superblock/tunefs. They are:

int32_t fs_avgfilesize; /* expected average file size */
int32_t fs_avgfpdir; /* expected # of files per directory */

These parameters have reasonable defaults but may be tweeked for special
uses of a filesystem. They are only necessary in rare cases like better
tuning a filesystem being used to store a squid cache.

I have been using this algorithm for about 3 months. I have done
a lot of testing on filesystems with different capacities, average
filesize, average number of files per directory, and so on. I think
this algorithm has no negative impact on filesystem perfomance. It
works better than the default one in all cases. The new dirpref
will greatly improve untarring/removing/coping of big directories,
decrease load on cvs servers and much more. The new dirpref doesn't
speedup a compilation process, but also doesn't slow it down.

Obtained from: Grigoriy Orlov <gluk@ptci.ru>
=====

=====
iedowse 2001/04/23 17:37:17 PDT
Pre-dirpref versions of fsck may zero out the new superblock fields
fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause
panics if these fields were zeroed while a filesystem was mounted
read-only, and then remounted read-write.

Add code to ffs_reload() which copies the fs_contigdirs pointer
from the previous superblock, and reinitialises fs_avgf* if necessary.

Reviewed by: mckusick
=====

=====
nik 2001/04/10 03:36:44 PDT
Add information about the new options to newfs and tunefs which set the
expected average file size and number of files per directory. Could do
with some fleshing out.
=====
 1.12 03-Sep-2001  lukem deprecate fs_fscktime; we never used it.

in an effort to maintain compatibility with freebsd/openbsd/whatever,
i'm attempting to get the superblock format in sync, and freebsd uses
the int32_t at this position for `fs_pendinginodes'.

if we ever decide to implement fscktime functionality, we'll:
a) make sure to liaise with the other projects to reserve the same
spare field
b) actually implement the code this time ...

(this is also preparing us for other changes, like the new dirpref code)
 1.11 17-Aug-2001  lukem remove third argument (`int ns') from ffs_sb_swap(), and let ffs_sb_swap()
determine the endianness of the `struct fs *o' superblock from o->fs_magic
and set needswap as necessary, rather than trusting the caller to get
it right. invariably, almost every caller of ffs_sb_swap() was calling it
with ns set to the wrong value for ns anyway!
ansi KNF ffs_bswap.c declarations whilst here.

this fixes all sorts of problems when trying to use other-endian file systems,
notably the kernel trying to access memory *way* off, possibly corrupting or
panicing, and userland programs SEGVing and/or corrupting things (e.g,
"fsck_ffs -B" to swap a file system endianness).

whilst the previous rev of ffs_bswap.c (1.10, 2000/12/23) made this problem
worse, i suspect that the problem was always there and previous versions
just happened not to trash things at the wrong time.

FFS_EI should now be a lot more stable.
 1.10 23-Dec-2000  enami branches: 1.10.2; 1.10.6;
- 16 * 8 != 168
- offset should be endian independent.
 1.9 23-Dec-2000  enami Cosmetic changes
 1.8 15-May-2000  bouyer branches: 1.8.4;
Sync copyrigth notice.
 1.7 18-Jan-2000  bouyer Handle pre-FS_42POSTBLFMT. I now can mount an Ultrix file system on my
sparc without panic.
 1.6 14-Sep-1999  thorpej branches: 1.6.2; 1.6.8;
Need <string.h> for memcpy(3) prototype if building from userland.
 1.5 09-Aug-1998  perry branches: 1.5.6;
bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.4 13-Jun-1998  kleink KNF, mostly of FFS_EI changes.
 1.3 10-Jun-1998  kleink KNF: only include one of <sys/{param,types}.h>, not both.
 1.2 08-Jun-1998  ragge Wrong include file order; caused compile error on vax.
 1.1 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.5.6.1 18-Jan-2000  he Pull up revision 1.7 (requested by bouyer):
Properly handle pre-FS_42POSTBLFMT file systems (e.g. Ultrix) in
the endian-independent file system code.
 1.6.8.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.6.2.2 05-Jan-2001  bouyer Sync with HEAD
 1.6.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.4.5 25-Nov-2001  he Pull up revision 1.14 (requested by lukem):
A few ffs_sb_swap() fixes.
 1.8.4.4 25-Nov-2001  he Pull up revision 1.13 (requested by lukem):
Pull in enhanced ffs_dirpref() algorithm, which provides a
substantial performance improvement through better locality
between parent/child directories and their files, and by easing
the pressure on the buffer cache for metadata operations.
 1.8.4.3 25-Nov-2001  he Pull up revision 1.12 (requested by lukem):
Deprecate unused fs_fscktime.
 1.8.4.2 25-Nov-2001  he Pull up revision 1.11 (requested by lukem):
Call ffs_sb_swap() with the correct arguments. Fixes problems
with using other-endian file systems.
 1.8.4.1 25-Nov-2001  he Pull up revisions 1.9-1.10 (requested by lukem):
Offset should be endian independent. Some cosmetic changes.
 1.10.6.4 11-Feb-2002  jdolecek Sync w/ -current.
 1.10.6.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.6.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.10.6.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.10.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.10.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.10.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.10.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.10.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.13.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.20.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.20.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.20.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.20.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.20.2.1 03-Aug-2004  skrll Sync with HEAD
 1.27.4.1 02-Jun-2005  riz Pull up revision 1.30 (requested by is in ticket #1973):
fix copy/paste/don'tupdate bug (fix from PR 22232 by Robert Elz).
 1.28.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.28.4.1 29-Apr-2005  kent sync with -current
 1.29.2.1 02-Jun-2005  tron Pull up revision 1.30 (requested by is in ticket #385):
fix copy/paste/don'tupdate bug (fix from PR 22232 by Robert Elz).
 1.32.84.1 19-Jan-2009  skrll Sync with HEAD.
 1.32.74.2 11-Mar-2010  yamt sync with head
 1.32.74.1 04-May-2009  yamt sync with head.
 1.34.8.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.34.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.34.4.1 21-Apr-2011  rmind sync with head
 1.35.14.3 03-Dec-2017  jdolecek update from HEAD
 1.35.14.2 23-Jun-2013  tls resync from head
 1.35.14.1 25-Feb-2013  tls resync with head
 1.35.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.35.4.1 23-Jan-2013  yamt sync with head
 1.37.10.2 28-Aug-2017  skrll Sync with HEAD
 1.37.10.1 06-Jun-2015  skrll Sync with HEAD
 1.37.8.1 04-Nov-2015  riz Pull up following revision(s) (requested by riastradh in ticket #896):
sys/ufs/ffs/ffs_bswap.c: revision 1.38
sys/ufs/ffs/ffs_bswap.c: revision 1.39
Don't (harmlessly) overrun di_db array; copy di_ib separately.
Noted by Coverity, CID 974635.
While here, simplify size calculation for memcpy.
memcpy di_extb/db/ib separately. Noted by Coverity, CID 974636.
 1.39.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.39.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.10 28-Nov-2022  chs the UFS_EXTATTR option was supposed to affect only UFS1 file systems,
but when the UFS2 extattr code was merged, the UFS_EXTATTR option was
mistakenly changed to affect UFS2 file systems as well. this commit
changes UFS_EXTATTR back to affecting only UFS1 file systems as originally
intended. in UFS2 (or rather UFS2ea in NetBSD), extattrs are a
native feature and are always supported.
 1.9 17-Nov-2022  chs Restore backward compatibility of UFS2 with previous NetBSD releases by
disabling support in UFS2 for extended attributes (including ACLs).
Add a new variant of UFS2 called "UFS2ea" that does support extended attributes.
Add new fsck_ffs operations "-c ea" and "-c no-ea" to convert file systems
from UFS2 to UFS2ea and vice-versa (both of which delete all existing extended
attributes in the process).
 1.8 14-Dec-2021  chs ffs: fix the creation of device nodes on file systems with ACLs enabled.
 1.7 05-Sep-2020  riastradh Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.6 20-May-2020  christos remove accmode_t typedef (not needed, breaks llvm) from maxv@
 1.5 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.4 02-May-2020  christos Remove the unlock/relock hack by using IO_EXT to indicate that we are already
holding the lock.
 1.3 20-Apr-2020  christos branches: 1.3.2;
- Allow root to set system attributes, samba does this
- Fix locking issue, perhaps we should use our own mutex; does not seem worth
it for this simple case.
 1.2 19-Apr-2020  christos branches: 1.2.2;
- add locking
- wrap wapbl around truncating, ffs_extwrite does it on its own.
 1.1 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.2.2.3 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.2.2.2 20-Apr-2020  bouyer Sync with HEAD
 1.2.2.1 19-Apr-2020  bouyer file ffs_extattr.c was added on branch bouyer-xenpvh on 2020-04-20 11:29:14 +0000
 1.3.2.2 21-Apr-2020  martin Sync with HEAD
 1.3.2.1 20-Apr-2020  martin file ffs_extattr.c was added on branch phil-wifi on 2020-04-21 18:42:45 +0000
 1.88 07-Jan-2023  chs ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:

commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000

This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.

To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.

Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000

One last pass to get all the unsigned comparisons correct.


In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.87 28-Nov-2022  chs branches: 1.87.2;
the UFS_EXTATTR option was supposed to affect only UFS1 file systems,
but when the UFS2 extattr code was merged, the UFS_EXTATTR option was
mistakenly changed to affect UFS2 file systems as well. this commit
changes UFS_EXTATTR back to affecting only UFS1 file systems as originally
intended. in UFS2 (or rather UFS2ea in NetBSD), extattrs are a
native feature and are always supported.
 1.86 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.85 22-Aug-2018  msaitoh branches: 1.85.10;
- Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.84 09-Feb-2017  kre branches: 1.84.12; 1.84.14;

Sprinkle in a pinch of const, not too much, just enough
to add a little strength without affecting the overall balance...
 1.83 01-Oct-2016  jdolecek branches: 1.83.2;
allocate wapbl dealloc registration structures via pool, so that there is more
flexibility with limit handling
 1.82 27-Mar-2015  riastradh branches: 1.82.2;
Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.81 17-Mar-2015  hannken Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.80 16-Jun-2013  hannken branches: 1.80.10;
Add an UFS_SNAPGONE() ufs op replacing the calls
to ffs_snapgone() in ufs_lookup.c.

Ok: David Holland <dholland@netbsd.org>

Welcome to 6.99.22
 1.79 19-Oct-2012  drochner Implement experimental support to pass notifications that a file
was deleted from the filesystem to the disk driver, commonly
known as "discard" or "trim".
fs/driver support is in ffs and ata wd for now.
This is what was posted here:
http://mail-index.netbsd.org/tech-kern/2012/02/28/msg012813.html
with minor cleanup, and the global switch replaced by a mount option.
 1.78 17-Jun-2011  manu branches: 1.78.2; 1.78.12;
Add mount -o extattr option to enable extended attributs (corrently only
for UFS1).
Remove kernel option for EA backing store autocreation and do it by
default. Add a sysctl so that autocreated attriutr size can be modified.
 1.77 27-Apr-2011  hannken branches: 1.77.2;
Cleanup ffs fsync and make devices on wapbl enabled file systems work here:

- Replace the ugly sync loop in ffs_full_fsync() and ffs_vfs_fsync() with
vflushbuf(). This loop is a relic of softdeps and not needed anymore.

- Add ffs_spec_fsync() for device nodes on ffs file systems that calls
spec_fsync() like all other file systems do and then updates the ctime.

Discussed on tech-kern.

Should fix PRs:
PR #41192 wapbl diagnostic panic during cgdconfig
PR #41977 kernel diagnostic assertion "rw_lock_held(&wl->wl_rwlock)" failed
PR #42149 wapbl locking panic if watching DVD
PR #42551 Lockdebug assert in wapbl when running zpool
 1.76 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.75 22-Feb-2009  ad branches: 1.75.4; 1.75.6; 1.75.8;
PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.74 06-Dec-2008  joerg branches: 1.74.4;
Split ffs_freefile into a frontend for normal cylinder group and for
snapshot use. Adjust ffs_blkfree_common to get the fs instance passed
in, the original commit didn't account blocks in the snapshots
correctly. Assert that ffs_blkfree is used with the primary fs instance
and that ffs_checkfreefile is only used for snapshots. Move the bdwrite
from ffs_blkfree_common into the caller for symmetry. This creates a
redundant write of unmodified data for ffs_blkfree_snap if a double free
of a block happens.

Reviewed and tested by hannken@.
 1.73 01-Dec-2008  joerg ffs_blkfree is used in two different ways. The normal usage is to free a
block in the cylinder groups of the filesystem. The other user is the
snapshot code, which wants to modify the copied cylinder groups. Use
different frontends to distinguish the cases in preparation for fine
grained locking for cylinder groups.
 1.72 30-Nov-2008  joerg Split ffs_blkalloc into a frontend that does inode based consistency
checks and a backend that just asserts them. Use the backend in
ffs_wapbl_abort_sync_metadata instead of faking an inode.
 1.71 06-Nov-2008  joerg Remove XXXUBC code for ffs_reallocblks, that has been conditionalized in
2002 and #if 0'ed in 2005. It would need a considerable amount of work
to bring back and obscures the more important block allocation.
 1.70 10-Oct-2008  hannken branches: 1.70.2;
Break a deadlock where one thread has a wapbl transaction, calls VOP_GETPAGES
and wants to busy a page while another thread calls VOP_PUTPAGES on the same
vnode, takes pages busy and wants to start a wapbl transaction.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.69 22-Aug-2008  hannken Add snapshot support for logging ffs file systems.

- Add UFS_WAPBL_BEGIN() / UFS_WAPBL_END() where needed.

- Expunge WAPBL log inodes from snapshots.

- Ffs_copyonwrite() and ffs_snapblkfree() must run inside a WAPBL transaction.

- Add ffs_gop_write() as a wrapper around genfs_gop_write() that makes sure
genfs_gop_write() gets always called inside a WAPBL transaction.

- Add VOP_PUTPAGES() flag PGO_JOURNALLOCKED to tag calls to VOP_PUTPAGES()
inside a WAPBL transaction.

Reviewed by: Simon Burge <simonb@netbsd.org>, Greg Oster <oster@netbsd.org>

PGO_JOURNALLOCKED / ffs_gop_write() part presented on tech-kern@.
 1.68 12-Aug-2008  hannken Deny read/write access to snapshot vnodes. We use fss(4) to read from
snapshots. With this policy in place:

- Separate the snapshot vnode lock from the snapshot common lock.
Snapshots no longer need recursive vnode locks.

- Use a mutex (si_snaplock) to serialize creation, deletion, reading and
writing of snapshots.

- Move ffs_read() for snapshots into ffs_snapshot.c.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>

While here change ffs_copyonwrite() to fail requests from pagedaemon that need
to copy-on-write.
 1.67 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.66 28-Jun-2008  rumble branches: 1.66.2;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.65 03-Jun-2008  hannken branches: 1.65.2;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.64 17-Apr-2008  hannken branches: 1.64.2; 1.64.4; 1.64.6;
Replace get/setspecific with a void pointer in struct ufsmount. Use explicit
initialization/finalization of snapshot private data on creation/deletion
of struct ufsmount.
Snapshot mounts no longer may fail silently because kmem_alloc() fails.

Welcome to 4.99.60

Ok: Andrew Doran <ad@netbsd.org>
 1.63 03-Jan-2008  ad branches: 1.63.6;
Use pool_cache.
 1.62 02-Jan-2008  ad Merge vmlocking2 to head.
 1.61 08-Dec-2007  pooka branches: 1.61.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.60 09-Aug-2007  hannken branches: 1.60.2; 1.60.8; 1.60.10;
Move the fstrans-aware lock vnops from ufs to ffs. Other ufs file systems
do not need them.

Ride on 4.99.28
 1.59 09-Aug-2007  hannken Move snapshot per-mount data from struct ufsmount to mount specific data.
No functional changes.

Welcome to 4.99.28 (struct ufsmount changed size)
 1.58 31-Jul-2007  pooka branches: 1.58.2; 1.58.4;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.57 12-Jul-2007  dsl branches: 1.57.2;
Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.56 07-Jun-2007  yamt remove a duplicated definition of FFS_ITIMES.
 1.55 19-Jan-2007  hannken branches: 1.55.6; 1.55.8;
New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.54 13-Jul-2006  martin branches: 1.54.4;
Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.53 14-May-2006  elad branches: 1.53.4;
integrate kauth.
 1.52 23-Apr-2006  yamt remove unused FFS_NAMES and LFS_NAMES.
 1.51 14-Jan-2006  yamt branches: 1.51.2; 1.51.4; 1.51.6; 1.51.8; 1.51.10;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.50 27-Dec-2005  chs branches: 1.50.2;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.
 1.49 11-Dec-2005  christos merge ktrace-lwp.
 1.48 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.47 12-Sep-2005  christos branches: 1.47.2;
- access the ffs and ext2fs itimes functions through a pointer, so that
if the filesystem is not compiled in the kernel still links. Probably
a better solution is to use weak symbols.
- move the filesystem-specific itime macros to the filesystem header files.
 1.46 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.45 09-Sep-2005  yamt revert the code to expand putpage requests to block boundary.
because:
- it was incomplete in some cases.
- it can confuse pagedaemon.
see PR/15364 for details.
 1.44 28-Aug-2005  thorpej Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature. I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress. There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
 1.43 15-Jul-2005  thorpej Use ANSI function decls.
 1.42 26-Feb-2005  perry branches: 1.42.2; 1.42.4;
nuke trailing whitespace
 1.41 29-Aug-2004  hannken branches: 1.41.4; 1.41.6;
While creating a snapshot inodes must be freed from the
snapshot, not from the file system.
ffs_freefile() needs explicit "fs" and "devvp" arguments.
 1.40 04-Jun-2004  he Need to forward-declare "struct timespec" because the new ffs_snapshot()
function declaration refers to it. Fixes build problem of sbin/badsect
for the vax target, which still uses gcc 2.95.3.
 1.39 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.38 20-May-2004  atatat Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.

This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.

linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.37 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.36 10-Jan-2004  hannken branches: 1.36.2;
Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue.

Cleanup ffs_sync() which did not synchronously wait when MNT_WAIT
was specified. Clear the work queue when MNT_WAIT is specified.

Result is a clean on-disk file system after ffs_sync(.., MNT_WAIT, ..)

From FreeBSD.
 1.35 02-Jan-2004  dbj add uuid field to apple ufs volume label
 1.34 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.33 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.32 29-Jun-2003  fvdl branches: 1.32.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.31 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.30 29-Jun-2003  enami Add forward declaration of struct lwp instead of struct proc. Sort those
while I'm here.
 1.29 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.28 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.27 15-Mar-2003  kristerw ffs_gop_alloc() is not used any more. Remove it.

OK:ed by Konrad Schroder.
 1.26 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.25 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.24 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.23 28-Sep-2002  dbj Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.22 05-May-2002  chs for softdep vnodes, always write together the pages for any block that
might have a dependency , since the accounting doesn't work otherwise.
fixes PRs 15364 16336 16448.
 1.21 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.20 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.19 17-Aug-2001  lukem branches: 1.19.2;
remove third argument (`int ns') from ffs_sb_swap(), and let ffs_sb_swap()
determine the endianness of the `struct fs *o' superblock from o->fs_magic
and set needswap as necessary, rather than trusting the caller to get
it right. invariably, almost every caller of ffs_sb_swap() was calling it
with ns set to the wrong value for ns anyway!
ansi KNF ffs_bswap.c declarations whilst here.

this fixes all sorts of problems when trying to use other-endian file systems,
notably the kernel trying to access memory *way* off, possibly corrupting or
panicing, and userland programs SEGVing and/or corrupting things (e.g,
"fsck_ffs -B" to swap a file system endianness).

whilst the previous rev of ffs_bswap.c (1.10, 2000/12/23) made this problem
worse, i suspect that the problem was always there and previous versions
just happened not to trash things at the wrong time.

FFS_EI should now be a lot more stable.
 1.18 09-Aug-2001  lukem be consistent and use "u_char" instead of "unsigned char"
 1.17 27-Nov-2000  chs branches: 1.17.2; 1.17.6;
Initial integration of the Unified Buffer Cache project.
 1.16 04-Apr-2000  jdolecek branches: 1.16.4;
Add a new sysctl variable vfs.ffs.log_changeopt - if this is true,
an optimalization strategy change is logged into syslog. Default
is 0 (to not log). This replaces the recent not quite "right"
change to only log the change if kernel is compiled with DEBUG.
 1.15 16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading.

For each leaf filesystem, add appropriate vfs_done routine.

Also remember how many times ffs_init() was called and do
the appropriate initialization on first call only. In ffs_done(),
destroy the resources when called by the last user of ffs code.
Change mfs to call ffs_init()/ffs_done() appropriately.
 1.14 14-Feb-2000  fvdl Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.13 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.12 26-Feb-1999  wrstuden branches: 1.12.4; 1.12.8; 1.12.10; 1.12.14;
Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.11 01-Sep-1998  thorpej branches: 1.11.2;
Use the pool allocator and the "nointr" pool page allocator for FFS inodes.

XXX MFS also comes in here for inodes, and used a different malloc type,
but the structure is the same, so we just use the FFS inode pool.
 1.10 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.9 22-Jun-1998  sommerfe defopt for options FIFO
 1.8 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.7 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.6 22-Dec-1996  cgd Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.5 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.4 09-Feb-1996  christos ffs prototypes
 1.3 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.11.2.3 30-May-1999  chs remove "allocedp" arg to ffs_balloc().
 1.11.2.2 25-Feb-1999  chs update ffs_balloc(), add ffs_balloc_range().
 1.11.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.12.14.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.12.10.2 26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.12.10.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.12.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.12.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.4.4 31-Jul-1999  chs add proto for ffs_balloc1().
 1.12.4.3 04-Jul-1999  chs ffs_balloc() is now a VOP. ffs_balloc_range() is gone.
use the genfs getpages and putpages for ffs.
 1.12.4.2 21-Jun-1999  thorpej Forward decl struct csum.
 1.12.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.16.4.1 25-Nov-2001  he Pull up revision 1.19 (requested by lukem):
Call ffs_sb_swap() with the correct arguments. Fixes problems
with using other-endian file systems.
 1.17.6.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.17.6.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.17.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.17.6.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.17.2.5 11-Dec-2002  thorpej Sync with HEAD.
 1.17.2.4 18-Oct-2002  nathanw Catch up to -current.
 1.17.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.17.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.17.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.19.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.32.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.32.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.32.2.7 21-Sep-2004  skrll Fix the sync with head I botched.
 1.32.2.6 18-Sep-2004  skrll Sync with HEAD.
 1.32.2.5 03-Sep-2004  skrll Sync with HEAD
 1.32.2.4 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.32.2.3 18-Aug-2004  skrll s/proc/lwp/
 1.32.2.2 03-Aug-2004  skrll Sync with HEAD
 1.32.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.36.2.1 23-May-2004  tron Pull up revision 1.38 (requested by atatat in ticket #374):
Tweak sysctl setup functions (the macros, actually) for use in lkms,
and tweak lkminit_*.c (where applicable) to call them, and to call
sysctl_teardown() when being unloaded.
This consists of (1) making setup functions not be static when being
compiled as lkms (change to sys/sysctl.h), (2) making prototypes
visible for the various setup functions in header files (changes to
various header files), and (3) making simple "load" and "unload"
functions in the actual lkminit stuff.
linux_sysctl.c also needs its root exposed (ie, made not static) for
this (when built as an lkm).
 1.41.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.41.4.1 29-Apr-2005  kent sync with -current
 1.42.4.5 21-Jan-2008  yamt sync with head
 1.42.4.4 03-Sep-2007  yamt sync with head.
 1.42.4.3 26-Feb-2007  yamt sync with head.
 1.42.4.2 30-Dec-2006  yamt sync with head.
 1.42.4.1 21-Jun-2006  yamt sync with head.
 1.42.2.1 21-Oct-2005  tron Pull up following revision(s) (requested by yamt in ticket #845):
sys/ufs/ffs/ffs_extern.h: revision 1.45 via patch
sys/ufs/ffs/ffs_vnops.c: revision 1.75 via patch
revert the code to expand putpage requests to block boundary.
because:
- it was incomplete in some cases.
- it can confuse pagedaemon.
see PR/15364 for details.
 1.47.2.1 20-Oct-2005  yamt adapt ufs.
 1.50.2.1 15-Jan-2006  yamt sync with head.
 1.51.10.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.51.8.6 11-May-2006  elad sync with head
 1.51.8.5 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.51.8.4 03-May-2006  yamt - wrap some kernel-only things by #ifdef _KERNEL.
- place __END_DECLS correctly.

ok'd by elad@
 1.51.8.3 18-Apr-2006  elad make build.sh tools work. from matt.
 1.51.8.2 08-Mar-2006  elad Include sys/kauth.h here.
 1.51.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.51.6.2 11-Aug-2006  yamt sync with head
 1.51.6.1 24-May-2006  yamt sync with head.
 1.51.4.1 01-Jun-2006  kardel Sync with head.
 1.51.2.1 09-Sep-2006  rpaulo sync with head
 1.53.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.54.4.1 01-Feb-2007  ad Sync with head.
 1.55.8.2 11-Jul-2007  mjf Sync with head.
 1.55.8.1 30-Mar-2007  mjf Provide a test journal. It's just a wrapper to bwrite and doesn't
actually do any journaling, but we need something to give the
transactions to.
 1.55.6.5 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.55.6.4 20-Aug-2007  ad Sync with HEAD.
 1.55.6.3 20-Aug-2007  ad softdep locking improvements. It hangs looping in flush_inodedep_deps(),
more work required.
 1.55.6.2 15-Jul-2007  ad Sync with head.
 1.55.6.1 09-Jun-2007  ad Sync with head.
 1.57.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.58.4.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.58.4.1 31-Jul-2007  pooka file ffs_extern.h was added on branch matt-mips64 on 2007-07-31 21:14:21 +0000
 1.58.2.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.58.2.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.60.10.2 26-Dec-2007  ad Sync with head.
 1.60.10.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.60.8.1 18-Feb-2008  mjf Sync with HEAD.
 1.60.2.1 09-Jan-2008  matt sync with HEAD
 1.61.4.2 08-Jan-2008  bouyer Sync with HEAD
 1.61.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.63.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.63.6.4 28-Sep-2008  mjf Sync with HEAD.
 1.63.6.3 29-Jun-2008  mjf Sync with HEAD.
 1.63.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.63.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.64.6.3 10-Oct-2008  skrll Sync with HEAD.
 1.64.6.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.64.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.64.4.1 04-May-2009  yamt sync with head.
 1.64.2.1 04-Jun-2008  yamt sync with head
 1.65.2.3 28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.65.2.2 03-Jul-2008  simonb Sync with head.
 1.65.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.66.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.66.2.1 19-Oct-2008  haad Sync with HEAD.
 1.70.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.70.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.74.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.75.8.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.75.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.75.4.2 31-May-2011  rmind sync with head
 1.75.4.1 21-Apr-2011  rmind sync with head
 1.77.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.78.12.3 03-Dec-2017  jdolecek update from HEAD
 1.78.12.2 23-Jun-2013  tls resync from head
 1.78.12.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.78.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.78.2.1 30-Oct-2012  yamt sync with head
 1.80.10.3 28-Aug-2017  skrll Sync with HEAD
 1.80.10.2 05-Oct-2016  skrll Sync with HEAD
 1.80.10.1 06-Apr-2015  skrll Sync with HEAD
 1.82.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.82.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.83.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.84.14.2 21-Apr-2020  martin Sync with HEAD
 1.84.14.1 10-Jun-2019  christos Sync with HEAD
 1.84.12.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.85.10.1 20-Apr-2020  bouyer Sync with HEAD
 1.87.2.1 13-May-2023  martin Pull up following revision(s) (requested by chs in ticket #160):

usr.sbin/makefs/ffs/ffs_alloc.c: revision 1.31
sbin/tunefs/tunefs.c: revision 1.58
sbin/fsck_ffs/setup.c: revision 1.105
sbin/fsck_ffs/pass5.c: revision 1.56
usr.sbin/makefs/ffs.c: revision 1.74
usr.sbin/makefs/ffs/mkfs.c: revision 1.42
usr.sbin/makefs/Makefile: revision 1.40
sys/ufs/ffs/fs.h: revision 1.71
sbin/fsdb/fsdb.c: revision 1.54
sbin/resize_ffs/resize_ffs.c: revision 1.58
sbin/fsck_ffs/pass4.c: revision 1.29
usr.sbin/makefs/ffs/ffs_extern.h: revision 1.9
sbin/newfs/mkfs.c: revision 1.133
sys/ufs/ffs/ffs_alloc.c: revision 1.172
sbin/fsck_ffs/pass1b.c: revision 1.24
usr.sbin/dumpfs/dumpfs.c: revision 1.68
sys/ufs/ffs/ffs_extern.h: revision 1.88
usr.sbin/quotacheck/quotacheck.c: revision 1.51
sys/ufs/ffs/ffs_subr.c: revision 1.54
sbin/fsck_ffs/main.c: revision 1.91
sbin/fsck_ffs/pass1.c: revision 1.63

ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:
commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000
This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.
To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.
Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000
One last pass to get all the unsigned comparisons correct.

In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.131 31-Jul-2020  chs fix the UFS2 extattr truncate code to play nice with wapbl.
also, rather than pull in the FreeBSD V_NORMAL/V_ALT flags to
vinvalbuf() and the buf b_xflags field and BX_ALTDATA flag,
add a binvalbuf() function to invalid a specific buffer
and use that to invalidate the two possible exattr bufs
during IO_EXT truncations.
 1.130 26-Jul-2020  chs pull in a bit more FreeBSD code to allow specifying truncation of
the regular bmap (IO_NORMAL) independently of the extattr bmap (IO_EXT).
fixes fs corruption when removing extattrs in UFS2.
 1.129 02-May-2020  christos Remove the unlock/relock hack by using IO_EXT to indicate that we are already
holding the lock.
 1.128 23-Apr-2020  ad PR kern/54759 (vm.ubc_direct deadlock when read()/write() into mapping of itself)

- Add new flag UBC_ISMAPPED which tells ubc_uiomove() the object is mmap()ed
somewhere. Use it to decide whether to do direct-mapped copy, rather than
poking around directly in the vnode in ubc_uiomove(), which is ugly and
doesn't work for tmpfs. It would be nicer to contain all this in UVM but
the filesystem provides the needed locking here (VV_MAPPED) and to
reinvent that would suck more.

- Rename UBC_UNMAP_FLAG() to UBC_VNODE_FLAGS(). Pass in UBC_ISMAPPED where
appropriate.
 1.127 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.126 23-Feb-2020  ad branches: 1.126.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.125 10-Dec-2018  jdolecek branches: 1.125.6;
make UFS_WAPBL_JLOCK_ASSERT() #ifdef DIAGNOSTIC, same as the underlying
function KASSERT(), so that it actually does something; fix code using
it to actually pass correct params, so that it compiles

remove UFS_WAPBL_JUNLOCK_ASSERT(), as that is inherently racy (it's
okay on those places if the rwlock is held by other lwp); depend
on the RW_ASSERT()/LOCKDEBUG inside rw_enter() to catch the case
with wapbl rwlock held by current lwp
 1.124 18-Mar-2017  riastradh branches: 1.124.12; 1.124.14;
#if DIAGNOSTIC panic ---> KASSERT
 1.123 11-Nov-2016  hannken branches: 1.123.2;
Fix a "slight tweak" from Rev. 1.121: bap1/bap2 must be valid
before using BAP_ASSIGN().

Prevents NULL pointer dereference when "lastbn >= 0".
 1.122 10-Nov-2016  jdolecek during truncate with wapbl, register deallocation for upper indirect block
before recursing into lower blocks, to make sure that it will be removed after
all its referenced blocks are removed

fixes 'ffs_blkfree_common: freeing free block' panic triggered by
ufs_truncate_retry() when just the upper indirect block registration failed,
code tried to free the lower blocks again after wapbl flush

problem found by hannken@, thank you
 1.121 10-Nov-2016  jdolecek ffs_indirtrunc(): for !wapbl, restore rev 1.117 behavior of writing the zeroed
(indirect) block before freeing the referenced blocks; it's necessary for
fsck to recover the filesystem, if system goes down during truncate

patch courtesy of hannken@ with only sligh tweaks
 1.120 07-Nov-2016  jdolecek fix broken test for partial truncate, introduced in rev 1.118

PR kern/51601 kern/51602
 1.119 07-Nov-2016  jdolecek reduce diff vs 1.117, no functional change
 1.118 28-Oct-2016  jdolecek reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
succeed; change wapbl_register_deallocation() to return EAGAIN
rather than panic when code hits the limit

callers changed to either loop calling ffs_truncate() using new
utility ufs_truncate_retry() if their semantics requires it, or
just ignore the failure; remove ufs_wapbl_truncate()

this fixes possible user-triggerable panic during truncate, and
resolves WAPBL performance issue with truncates of large files

PR kern/47146 and kern/49175
 1.117 28-Mar-2015  maxv branches: 1.117.2;
Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.116 20-Oct-2013  htodd branches: 1.116.6;
Definining needswap where needed.
 1.115 23-Jun-2013  dholland branches: 1.115.2;
Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.114 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.113 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.112 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.111 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.110 09-Jul-2012  matt branches: 1.110.2;
Convert a KDASSERT to a KDASSERTMSG
 1.109 27-Jan-2012  para converting readdir in ffs ext2fs from malloc(9) to kmem(9)
while there allocate ufs mount structs from kmem(9) too
preceding kmem-vmem-pool-patch

releng@ acknowledged
 1.108 23-Nov-2011  bouyer branches: 1.108.2;
If ufs_balloc_range() fails, make sure to call ?fs_truncate() to
reset v_writesize to the right value.
If v_writesize is left larger than the allocated blocks, we may have
the same issue as the one described in
http://mail-index.netbsd.org/tech-kern/2010/02/02/msg007156.html
 1.107 16-Jun-2011  hannken branches: 1.107.2;
Rename uvm_vnp_zerorange(struct vnode *, off_t, size_t) to
ubc_zerorange(struct uvm_object *, off_t, size_t, int) changing
the first argument to an uvm_object and adding a flags argument.

Modify tmpfs_reg_resize() to zero the backing store (aobj) instead
of the vnode. Ubc_purge() no longer panics when unmounting tmpfs.

Keep uvm_vnp_zerorange() until the next kernel version bump.
 1.106 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.105 06-Mar-2011  bouyer branches: 1.105.2;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.104 07-Feb-2010  bouyer branches: 1.104.4; 1.104.6; 1.104.8;
- ufs_balloc_range(): on error, only PG_RELEASED the pages that were
allocated to extend the file to the new size. Releasing all pages
may release pages that contains previously-written data not yet flushed
to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
the new length, call uvm_vnp_setsize(). *_truncate() may have been
called by *_write() in the error path (e.g. block allocation failure
because of quota of file system full), and at this point v_writesize
has been set to the desired size of the file and not reverted to the
old size. Not adjusting v_writesize to the real size cause
genfs_do_io() to write to disk past the real end of the file.
 1.103 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.102 15-Jan-2009  pooka branches: 1.102.2;
Revert 1.101, author did not provide a justification.
 1.101 23-Dec-2008  cegger ffs_update: sprinkle KASSERTs
 1.100 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.99 30-Aug-2008  hannken branches: 1.99.2; 1.99.4; 1.99.10;
ffs_truncate() always runs with journal locked. Propagate this information
to VOP_PUTPAGES().

Report from Lars Nordlund on current-users@
 1.98 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.97 03-Jun-2008  hannken branches: 1.97.2; 1.97.4;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.96 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.95 27-Mar-2008  ad branches: 1.95.2; 1.95.4; 1.95.6;
Make rusage collection per-LWP and collate in the appropriate places.
cloned threads need a little bit more work but the locking needs to
be fixed first.
 1.94 09-Jan-2008  ad branches: 1.94.6;
Go back to freeing on disk inodes in the inactive routine. It would be
better not to do this, but it rules out potential side effects with softdep.
 1.93 02-Jan-2008  ad Merge vmlocking2 to head.
 1.92 08-Dec-2007  pooka branches: 1.92.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.91 08-Dec-2007  ad Grab ump->um_lock in another spot.
 1.90 26-Nov-2007  pooka branches: 1.90.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.89 08-Oct-2007  ad branches: 1.89.4;
Merge ffs locking & brelse changes from the vmlocking branch.
 1.88 10-Jul-2007  hannken branches: 1.88.6; 1.88.8; 1.88.10;
Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.87 05-Jun-2007  yamt improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.86 04-Mar-2007  christos branches: 1.86.2; 1.86.4; 1.86.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.85 17-Oct-2006  yamt branches: 1.85.4;
ffs_truncate: don't forget to zero the past eof in the case of
blocksize < pagesize. PR/33777 from Simon Burge.
XXX check other filesystems, esp. lfs.
 1.84 14-Oct-2006  yamt don't use g_glock directly.
 1.83 23-Jun-2006  yamt branches: 1.83.4; 1.83.6;
fix a simonb-timecounters regression.
the precision of getnanotime() is not suitable for file timestamps.
esp. when it's nfs-exported.

- introduce vfs_timestamp().
(the name is from freebsd. currently merely a wrapper of nanotime())
- for ufs-like filesystems, use it rather than getnanotime().

XXX check other filesystems.
 1.82 07-Jun-2006  kardel branches: 1.82.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.81 14-May-2006  elad branches: 1.81.2;
integrate kauth.
 1.80 18-Mar-2006  bouyer Fix dead error condition, coverity ID 747.
 1.79 11-Dec-2005  christos branches: 1.79.4; 1.79.6; 1.79.8; 1.79.10; 1.79.12;
merge ktrace-lwp.
 1.78 11-Nov-2005  yamt - ignore truncation for VCHR/VBLK/VFIFO as it used to be
before yamt-vop merge. PR/32049 from Atsushi Onoe.
- reject setattr which attempts to change size of VLNK/VSOCK.
 1.77 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.76 27-Sep-2005  yamt branches: 1.76.2;
introduce "ufs_ops" and use it for ITIMES.
 1.75 12-Sep-2005  christos Add another KASSERT.
 1.74 12-Sep-2005  drochner move the new ffs_itimes() to a berr place -- ffs_subr.c is shared with
userland
 1.73 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.72 15-Jul-2005  thorpej Use ANSI function decls.
 1.71 15-Aug-2004  mycroft branches: 1.71.12;
Don't write out the extra zero pages with PGO_SYNCIO. We start an asynchronous
write anyway, and they will not be freed until that write is finished.
 1.70 15-Aug-2004  mycroft Correct the fix for the partial-truncate inefficiency. We still need to zero,
but we only need to sync those pages that are being lopped off, if any.
 1.69 15-Aug-2004  mycroft Minor simplification to some arithmetic.
 1.68 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.67 14-Aug-2004  mycroft Partially fix a performance problem in the partial-truncate case. We were
doing synchronous writes unnecessarily in a couple of places. Now it's 1
write per truncate in my test case rather than 3. :-P
 1.66 14-Aug-2004  mycroft There is no need to do a synchronous write when truncating a short symlink.
 1.65 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.64 20-Jun-2004  hannken Use one daddr_t XXXblks[NDADDR + NIADDR] instead of two.
No functional changes. Reduces kernel stack usage by 120 bytes.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.63 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.62 25-Jan-2004  hannken Make VOP_STRATEGY(bp) a real VOP as discussed on tech-kern.

VOP_STRATEGY(bp) is replaced by one of two new functions:

- VOP_STRATEGY(vp, bp) Call the strategy routine of vp for bp.
- DEV_STRATEGY(bp) Call the d_strategy routine of bp->b_dev for bp.

DEV_STRATEGY(bp) is used only for block-to-block device situations.
 1.61 10-Jan-2004  yamt store a i/o priority hint in struct buf for buffer queue discipline.
 1.60 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.59 29-Jun-2003  fvdl branches: 1.59.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.58 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.57 15-May-2003  kristerw The C language does not permit statements of the form
(X ? Y : Z) = 0;
even though gcc handles this by a stupid extension.

Transform these to correct C.

Approved by fvdl.
 1.56 10-Apr-2003  fvdl Remove some leftover diagnostic checks.
 1.55 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.54 25-Jan-2003  fvdl The oldblks and newblks arrays are used to store direct copies of
on-disk block pointers, so they should be int32_t. Error found
by Izumi Tsutsui.
 1.53 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.52 26-Sep-2002  simonb Move a brace that is in the wrong position when changes from FreeBSD
were added in rev 1.51. This may fix the "N lost blocks" problem some
people have noticed.
Reviewed by fvdl.
 1.51 18-Dec-2001  fvdl branches: 1.51.10;
Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.50 18-Dec-2001  chs when truncating a file, make sure the last block of the file is actually
allocated, since other parts of the code assume this.
 1.49 30-Nov-2001  chs VOP_PUTPAGES() requires page-aligned offsets, so be sure to provide such.
fixes PR 14759.

(while I'm here, call VOP_PUTPAGES() directly instead of indirecting through
the UVM pager op vector.)
 1.48 08-Nov-2001  chs in both paths that can cause fragments to be expanded (write and truncate-up),
deal with the fragment expansion separately before the rest of the operation.
this allows us to simplify ufs_balloc_range() by not worrying about implicit
fragment expansion.

call VOP_PUTPAGES() directly for vnodes instead of
going through the UVM pager "put" vector.
 1.47 06-Nov-2001  simonb Remove some bogus checks for unsigned variables < 0.
 1.46 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.45 28-Sep-2001  chs branches: 1.45.2;
handle allocation errors in truncate-up case.
 1.44 20-Sep-2001  chs we can't assert that the inode and vnode sizes are consistent at the start
of ffs_truncate() since there are cases (eg. when ffs_write() gets ENOSPC)
where they should be different. move the assert to the end instead.
 1.43 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.42 30-Aug-2001  chs branches: 1.42.2;
min() -> MIN()
 1.41 30-May-2001  mrg branches: 1.41.4;
use _KERNEL_OPT
 1.40 27-Jan-2001  augustss branches: 1.40.2;
Fix from chuq:
don't update UVM's notion of the file size before the VOP_FSYNC() when
we're partially truncating a file with softdeps enabled. doing so could
free pages without updating the dependency info, which would result in
"panic: softdep_write_inodeblock: direct pointer #1 mismatch 0 != N".
 1.39 01-Jan-2001  matt Convert a MALLOC with a variable size to malloc(). Saves 220 bytes of text
on VAX.
 1.38 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.37 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.

Implement range fsync for FFS. Note: not yet implemented for the
SOFTDEP case.
 1.36 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.35 30-May-2000  mycroft branches: 1.35.2;
In ffs_update():
* Move the clearing of IN_MODIFIED and IN_ACCESSED later, so they are not
cleared if the bread() failed.
* Explicitly set waitfor to 0 in the softdep case, if IN_MODIFIED is not
set (mirroring the bwrite()/bdwrite() decision).
 1.34 29-May-2000  mycroft Add a new inode flags called IN_ACCESSED. This used in place of IN_MODIFIED
to record that the atime was updated. In ffs_update(), we only do synchronous
writes if something *other* than the atime was changed.
 1.33 28-May-2000  mycroft When unwinding a failed allocation, make sure to nuke the unwound block from
the vnode's block list. This fixes `itrunc3' panics (at least in some cases;
further testing is needed) and prevents further lossage later on.
 1.32 28-May-2000  mycroft Add a new function to remove extra buffers when truncating a file. This is
more generic than the vinvalbuf(V_SAVEMETA) case, avoiding synchronous
operations when truncating to a non-zero length.
 1.31 13-May-2000  perseant branches: 1.31.2;
Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.30 30-Mar-2000  augustss Remove register declarations.
 1.29 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.28 24-Mar-1999  mrg branches: 1.28.4; 1.28.8; 1.28.10; 1.28.14;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.27 05-Mar-1999  mycroft Pass null pointers to VOP_UPDATE rather than having all the callers fetch the
current time themselves.
 1.26 05-Mar-1999  mycroft Permit the access and modify time pointers passed to VOP_UPDATE to be null,
meaning the current time.
 1.25 12-Nov-1998  thorpej defopt FFS_EI
 1.24 23-Oct-1998  thorpej branches: 1.24.2;
Use DINODE_SIZE rather than pointer arithmetic.
 1.23 04-Oct-1998  christos Missed a conditional for FFS_EI; appears when we compile without -Ox
 1.22 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.21 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.20 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.19 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.18 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.17 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.16 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.15 04-Jul-1997  drochner Don't cast 64bit (off_t) file sizes to vm_offset_t (32bit on many
architectures), truncate them intelligently instead.
The truncation is done centralized in vnode_pager.c.
This prevents from wrap-over effects when parts of large (>2^32 byte) files
are mmapped.
Don't allow to mmap above the numerical range of vm_offset_t.
This is considered a temporary solution until the vm system handles the
object sizes/offsets more cleanly.
 1.14 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.13 27-Jan-1997  tls Correct old inode flag names in comment, and reformat for 80 character screen
 1.12 06-Nov-1996  thorpej Performance enhancement from Kirk McKusick <mckusick@McKusick.COM>:
When freeing an indirect block, there is no need to write it (synchronously,
no less!) before tossing it.
 1.11 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.10 11-May-1996  mycroft Change VOP_UPDATE() semantics:
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
 1.9 09-Feb-1996  christos ffs prototypes
 1.8 15-Jun-1995  cgd compensate for timeval/timespec/stat structure changes.
 1.7 14-Dec-1994  mycroft Sync with CSRG.
 1.6 28-Oct-1994  mycroft Don't allow truncating past maxfilesize.
 1.5 29-Jun-1994  cgd branches: 1.5.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.4 15-Jun-1994  mycroft Fastlink compat.
 1.3 13-Jun-1994  mycroft Format police.
 1.2 13-Jun-1994  pk Check requested file size; negative values cause havoc.
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.5.2.2 23-Nov-1994  cgd from mycroft, for patch_05
 1.5.2.1 19-Oct-1994  cgd temporary sanity checks, as suggested by charles.
 1.24.2.3 30-May-1999  chs update call to ffs_balloc() for new args.
fix an uninitialize variable in ffs_truncate().
 1.24.2.2 25-Feb-1999  chs add UBC stuff to ffs_truncate().
 1.24.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.28.14.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.28.14.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.28.10.2 26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.28.10.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.28.8.4 11-Feb-2001  bouyer Sync with HEAD.
 1.28.8.3 05-Jan-2001  bouyer Sync with HEAD
 1.28.8.2 08-Dec-2000  bouyer Sync with HEAD.
 1.28.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.28.4.3 31-Jul-1999  chs simplify ffs_truncate().
 1.28.4.2 11-Jul-1999  chs remove uvm_vnp_uncache(), it's no longer needed.
 1.28.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.31.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.35.2.3 26-Feb-2002  he Apply patch (requested by fvdl):
Fix a panic in the FFS sofdep code on an NFS server triggered by
an excerciser program run on an NFS client.
 1.35.2.2 30-Sep-2001  he Apply patch (requested by chuck):
Make one call to uvm_vnp_uncache() conditional. Fixes a panic
when removing an mmap'ing to an unlinked, closed file.
 1.35.2.1 14-Dec-2000  he Pull up revision 1.37 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.40.2.9 18-Oct-2002  nathanw Catch up to -current.
 1.40.2.8 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.40.2.7 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.40.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.40.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.40.2.4 08-Oct-2001  nathanw Catch up to -current.
 1.40.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.40.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.40.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.41.4.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.41.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.41.4.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.42.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.45.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.51.10.1 24-Jun-2003  grant Pull up revision 1.52 (requested by nakayama in ticket #1333):

Move a brace that is in the wrong position when changes from FreeBSD
were added in rev 1.51. This may fix the "N lost blocks" problem some
people have noticed.
Reviewed by fvdl.
 1.59.2.7 11-Dec-2005  christos Sync with head.
 1.59.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.59.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.59.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.59.2.3 25-Aug-2004  skrll Sync with HEAD.
 1.59.2.2 03-Aug-2004  skrll Sync with HEAD
 1.59.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.71.12.6 21-Jan-2008  yamt sync with head
 1.71.12.5 07-Dec-2007  yamt sync with head
 1.71.12.4 27-Oct-2007  yamt sync with head.
 1.71.12.3 03-Sep-2007  yamt sync with head.
 1.71.12.2 30-Dec-2006  yamt sync with head.
 1.71.12.1 21-Jun-2006  yamt sync with head.
 1.76.2.2 29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.76.2.1 20-Oct-2005  yamt adapt ufs.
 1.79.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.79.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.79.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.79.10.2 19-Apr-2006  elad sync with head.
 1.79.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.79.8.3 26-Jun-2006  yamt sync with head.
 1.79.8.2 24-May-2006  yamt sync with head.
 1.79.8.1 01-Apr-2006  yamt sync with head.
 1.79.6.4 01-Jun-2006  kardel Sync with head.
 1.79.6.3 22-Apr-2006  simonb Sync with head.
 1.79.6.2 05-Feb-2006  simonb In the *itimes functions, just call getnanotime() at the start of
the function and use the result if needed, rather than the previous
conditional calls/assignments method. The code is clearer this way,
and benchmarks at about the same speed.
 1.79.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.79.4.1 09-Sep-2006  rpaulo sync with head
 1.81.2.1 19-Jun-2006  chap Sync with head.
 1.82.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.83.6.1 22-Oct-2006  yamt sync with head
 1.83.4.1 18-Nov-2006  ad Sync with head.
 1.85.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.86.6.1 09-Dec-2007  reinoud Pullup to HEAD
 1.86.4.1 11-Jul-2007  mjf Sync with head.
 1.86.2.6 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.86.2.5 15-Jul-2007  ad Sync with head.
 1.86.2.4 09-Jun-2007  ad Sync with head.
 1.86.2.3 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.86.2.2 13-Apr-2007  ad Put a per-mount lock around ffs shared data structures, excluding softdep
and quotas. Strategy lifted from FreeBSD.
 1.86.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.88.10.1 14-Oct-2007  yamt sync with head.
 1.88.8.3 23-Mar-2008  matt sync with HEAD
 1.88.8.2 09-Jan-2008  matt sync with HEAD
 1.88.8.1 06-Nov-2007  matt sync with HEAD
 1.88.6.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.88.6.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.88.6.1 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.89.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.89.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.89.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.90.2.4 30-Dec-2007  ad ffs_update: if softdep and the inode has been unlinked, wait for the update
(and so dependencies) to flush. Ensures that the slate is clean when the
inode is reused. Should work around "panic: handle_written_inodeblock:
filefree".
 1.90.2.3 26-Dec-2007  ad Sync with head.
 1.90.2.2 08-Dec-2007  ad Sync with head.
 1.90.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.92.4.2 10-Jan-2008  bouyer Sync with HEAD
 1.92.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.94.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.94.6.4 28-Sep-2008  mjf Sync with HEAD.
 1.94.6.3 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.94.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.94.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.95.6.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.95.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.95.4.2 11-Mar-2010  yamt sync with head
 1.95.4.1 04-May-2009  yamt sync with head.
 1.95.2.2 04-Jun-2008  yamt sync with head
 1.95.2.1 18-May-2008  yamt sync with head.
 1.97.4.1 19-Oct-2008  haad Sync with HEAD.
 1.97.2.2 12-Jun-2008  martin License police
 1.97.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.99.10.1 21-Apr-2010  matt sync to netbsd-5
 1.99.4.2 25-Jan-2012  riz Pull up following revision(s) (requested by bouyer in ticket #1702):
sys/ufs/lfs/lfs_inode.c: revision 1.126
sys/ufs/ffs/ffs_inode.c: revision 1.108
If ufs_balloc_range() fails, make sure to call ?fs_truncate() to
reset v_writesize to the right value.
If v_writesize is left larger than the allocated blocks, we may have
the same issue as the one described in
http://mail-index.netbsd.org/tech-kern/2010/02/02/msg007156.html
 1.99.4.1 22-Feb-2010  snj Pull up following revision(s) (requested by bouyer in ticket #1302):
sys/ufs/ext2fs/ext2fs_inode.c: revision 1.71
sys/ufs/ffs/ffs_inode.c: revision 1.104
sys/ufs/lfs/lfs_inode.c: revision 1.121
sys/ufs/ufs/ufs_inode.c: revision 1.79
- ufs_balloc_range(): on error, only PG_RELEASED the pages that were
allocated to extend the file to the new size. Releasing all pages
may release pages that contains previously-written data not yet flushed
to disk. Should fix PR kern/35704
- {ffs,lfs,ext2fs}_truncate(): Even if the inode's size is the same as
the new length, call uvm_vnp_setsize(). *_truncate() may have been
called by *_write() in the error path (e.g. block allocation failure
because of quota of file system full), and at this point v_writesize
has been set to the desired size of the file and not reverted to the
old size. Not adjusting v_writesize to the real size cause
genfs_do_io() to write to disk past the real end of the file.
 1.99.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.99.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.102.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.104.8.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.104.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.104.4.2 21-Apr-2011  rmind sync with head
 1.104.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.105.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.107.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.107.2.3 23-Jan-2013  yamt sync with head
 1.107.2.2 30-Oct-2012  yamt sync with head
 1.107.2.1 17-Apr-2012  yamt sync with head
 1.108.2.1 18-Feb-2012  mrg merge to -current.
 1.110.2.4 03-Dec-2017  jdolecek update from HEAD
 1.110.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.110.2.2 23-Jun-2013  tls resync from head
 1.110.2.1 25-Feb-2013  tls resync with head
 1.115.2.1 18-May-2014  rmind sync with head
 1.116.6.3 28-Aug-2017  skrll Sync with HEAD
 1.116.6.2 05-Dec-2016  skrll Sync with HEAD
 1.116.6.1 06-Apr-2015  skrll Sync with HEAD
 1.117.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.117.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.117.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.123.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.124.14.3 21-Apr-2020  martin Sync with HEAD
 1.124.14.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.124.14.1 10-Jun-2019  christos Sync with HEAD
 1.124.12.1 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.125.6.1 29-Feb-2020  ad Sync with head.
 1.126.4.2 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.126.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.1 30-Mar-2007  mjf branches: 1.1.2;
file ffs_journal.c was initially added on branch mjf-ufs-trans.
 1.1.2.1 30-Mar-2007  mjf Provide a test journal. It's just a wrapper to bwrite and doesn't
actually do any journaling, but we need something to give the
transactions to.
 1.7 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.6 07-Jul-2016  msaitoh branches: 1.6.18; 1.6.24;
KNF. Remove extra spaces. No functional change.
 1.5 22-Feb-2015  maxv KNF, and simplify a bit.

No functional change
 1.4 12-Jun-2011  rmind branches: 1.4.12; 1.4.30;
Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.3 07-Jun-2011  bouyer Fix bad cut'n'paste in copyright. Pointed out by dyoung@
 1.2 06-Mar-2011  bouyer branches: 1.2.2; 1.2.4; 1.2.6;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.1 20-Jan-2011  bouyer branches: 1.1.2;
file ffs_quota2.c was initially added on branch bouyer-quota2.
 1.1.2.2 09-Feb-2011  bouyer Support MNT_UPDATE for quota2 (especially r/o -> r/w transitions)
 1.1.2.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.2.6.2 06-Jun-2011  jruoho Sync with HEAD.
 1.2.6.1 06-Mar-2011  jruoho file ffs_quota2.c was added on branch jruoho-x86intr on 2011-06-06 09:10:16 +0000
 1.2.4.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.2.2.4 12-Jun-2011  rmind sync with head
 1.2.2.3 23-Apr-2011  rmind Few fixes, missed in last sync with head.
 1.2.2.2 21-Apr-2011  rmind sync with head
 1.2.2.1 06-Mar-2011  rmind file ffs_quota2.c was added on branch rmind-uvmplock on 2011-04-21 01:42:20 +0000
 1.4.30.2 09-Jul-2016  skrll Sync with HEAD
 1.4.30.1 06-Apr-2015  skrll Sync with HEAD
 1.4.12.1 03-Dec-2017  jdolecek update from HEAD
 1.6.24.1 17-Jan-2020  ad Sync with head.
 1.6.18.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.155 11-May-2023  chs ffs: apply the remaining ffs_snapshot.c part of this FreeBSD commit:

commit 364ed814e7285c8216d8a201d3ab3674eb34ce29
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Dec 9 21:24:00 2004 +0000

Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.

Submitted by: Henry Whincup <henry@jot.to>
MFC after: 1 week

all the other changes in that commit were applied previously by others:
- sborrill commmitted ffs_alloc.c rev 1.123 in 2009
- simonb committed ffs_alloc.c rev 1.110 in 2008
- the ffs_clusteralloc() part is not needed because we no longer have
that function.

fixes PR 57307
 1.154 16-Apr-2022  hannken branches: 1.154.4;
Take the link count from the inode.
 1.153 05-Dec-2021  msaitoh s/shapshot/snapshot/
 1.152 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.151 23-Feb-2020  ad branches: 1.151.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.150 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.149 01-Jun-2017  chs branches: 1.149.10; 1.149.14; 1.149.16;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.148 01-Apr-2017  riastradh KASSERT(mutex_owned(vp->v_interlock)) in vnode iterator selector.
 1.147 18-Mar-2017  riastradh #if DIAGNOSTIC panic ---> KASSERT
 1.146 01-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().
 1.145 17-Feb-2017  hannken Bring back vrele_flush() to flush deferred vrele() o an suspended file system.
 1.144 17-Feb-2017  hannken Untangle VFS_SYNC() from VFS_SUSPENDCTL().
 1.143 28-Oct-2016  jdolecek branches: 1.143.2;
reorganize ffs_truncate()/ffs_indirtrunc() to be able to partially
succeed; change wapbl_register_deallocation() to return EAGAIN
rather than panic when code hits the limit

callers changed to either loop calling ffs_truncate() using new
utility ufs_truncate_retry() if their semantics requires it, or
just ignore the failure; remove ufs_wapbl_truncate()

this fixes possible user-triggerable panic during truncate, and
resolves WAPBL performance issue with truncates of large files

PR kern/47146 and kern/49175
 1.142 21-Oct-2016  jdolecek revert 1.141 - the second ffs_truncate() can't really fail

requested by hannken@
 1.141 20-Oct-2016  jdolecek allow also the snapshot_setup()'s call to ffs_truncate() fail, the code
should simply reuse the file blocks in that case; also make sure the
ffs_truncate() call is run within transaction if log is on
 1.140 28-Jun-2015  maxv branches: 1.140.2;
Small fixes.

ok hannken@
 1.139 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.138 28-Mar-2015  maxv Remove the 'cred' argument from breadn(), and update the man page
accordingly.

ok hannken@
 1.137 05-Sep-2014  matt branches: 1.137.2;
Don't nest structure definitions.
 1.136 10-Jul-2014  dholland Use an explicit compare to 0 for an immediate error result, not !.
Using ! is perfectly clear on variables like "error" or "result",
but directly on a function call it tends to look like a mistake.
 1.135 30-May-2014  hannken Testing "v_usecount == 1" for exclusive reference will not always
work -- remove and test only readonly.
 1.134 24-May-2014  christos Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.
 1.133 17-Mar-2014  hannken branches: 1.133.2;
Change snapshot_expunge() to use vfs_vnode_iterator.
 1.132 17-Dec-2013  joerg ib_get is not used in the evbarm/OPENRD kernel, so mark it as such.
 1.131 19-Oct-2013  martin Mark unused (in the !FFS_EI case) variables as such.
 1.130 19-Oct-2013  martin Mark a potentially unused (ifndef FFS_EI) variable
 1.129 30-Sep-2013  hannken Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.128 13-Sep-2013  joerg Kill unused function ib_assign.
 1.127 23-Jun-2013  dholland branches: 1.127.2;
Stick ffs_ in front of the following macros:
fragstoblks()
blkstofrags()
fragnum()
blknum()

to finish the job of distinguishing them from the lfs versions, which
Christos renamed the other day.

I believe this is the last of the overtly ambiguous exported symbols
from ffs... or at least, the last of the ones that conflicted with lfs.
ffs still pollutes the C namespace very broadly (as does ufs) and this
needs quite a bit more cleanup.

XXX: boo on macros with lowercase names. But I'm not tackling that just yet.
 1.126 23-Jun-2013  dholland Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.125 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.124 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.123 16-Jun-2013  hannken Add an UFS_SNAPGONE() ufs op replacing the calls
to ffs_snapgone() in ufs_lookup.c.

Ok: David Holland <dholland@netbsd.org>

Welcome to 6.99.22
 1.122 07-May-2013  hannken When invalidating short buffers on the snapshots clean list use bbusy()
to mark the buffer busy. There exists a small window where a buffer is
done but not released and therefore still busy.
 1.121 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.120 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.119 13-Mar-2012  elad branches: 1.119.2;
Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.118 07-Oct-2011  hannken branches: 1.118.2; 1.118.6;
As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.
 1.117 01-Jul-2011  hannken ffs_copyonwrite(): If the write is to the in-file-system journal
there is no need to lock and check the snapshots.
 1.116 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.115 08-May-2011  hannken branches: 1.115.2;
Revert previous commit. Locking the snapshot vnode while the file system
is suspended extends the suspension until the vnode gets unlocked by
the caller of ffs_snapshot().

Resuming the file system before expunging all snapshots and syncing the
snapshot creates races and deadlocks with journaling file systems at least.
 1.114 29-Apr-2011  hannken Before expunging all snapshots take the snapshot lock and resume the file
system as this is sufficient for the remaining operations.

Reduces the time the file system is suspended and should make this time
independent of the number of snapshots already present.
 1.113 23-Apr-2011  hannken ffs_snapshot(): return an error if the node is an invalid snapshot.
 1.112 18-Apr-2011  hannken Preallocate all cylinder group blocks so we no longer redo ~50% of
the cylinder groups while the file system is suspended.
This was removed in error with Rev 1.16.

From Manuel Bouyer <bouyer@netbsd.org> via tech-kern.
 1.111 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.110 24-Feb-2011  hannken fss(4): Allow FSSIOCSET to set the initial flags. Add a new flag
"FSS_UNLINK_ON_CREATE" to unlink the backing store before
the snapshot gets created.

With this change dump(8) no longer dumps the zero-sized, but named
snapshot it is working on. Same applies to fsck_ffs(8).
 1.109 23-Feb-2011  dyoung Initialize blkno to 0 right before the snapblkaddr() call that GCC does
not understand so that if ffs_copyonwrite() sprouts a new code path that
does not initialize blkno, the compiler has the chance to reveal it.
 1.108 23-Feb-2011  hannken Quiesce CC ('blkno' may be used uninitialized in this function).
 1.107 22-Feb-2011  he Move blocks_in_journal() in under #ifndef FFS_NO_SNAPSHOT, all uses
are under that ifdef anyway; this allows build with FFS_NO_SNAPSHOT defined.
 1.106 21-Feb-2011  hannken Change the snapshot lock:
- No need to take the snapshot lock while the file system is suspended.
- Allow ffs_copyonwrite() one level of recursion with snapshots locked.
- Do the block address lookup with snapshots locked.
- Take the snapshot lock while removing a snapshot from the list.

While hunting deadlocks change the transaction scope for ffs_snapremove().
We could deadlock from UFS_WAPBL_BEGIN() with a buffer held.
 1.105 18-Feb-2011  bouyer Initialize error in snapshot_expunge(); if the list is empty error would
be returned uninitialized. t_snapshot_v2 was failing for me when
librumpffs was compiled DGB=-g.
No idea why gcc didn't catch this ...
 1.104 18-Feb-2011  hannken Revert rev. 1.101. Dead snapshots would hang around until unmount.

Adresses PR #44568 (WAPBL doens't play nice with snapshots).
 1.103 16-Feb-2011  hannken Refine the scope of WAPBL transactions so we should no longer get
a "wapbl_flush: current transaction too big to flush" panic when
creating or removing snapshots on larger logging disks.

Adresses PR #44568 (WAPBL doens't play nice with snapshots).
 1.102 20-Dec-2010  matt branches: 1.102.2; 1.102.4;
Move counting of faults, traps, intrs, soft[intr]s, syscalls, and nswtch
from uvmexp to per-cpu cpu_data and move them to 64bits. Remove unneeded
includes of <uvm/uvm_extern.h> and/or <uvm/uvm.h>.
 1.101 12-Dec-2010  hannken Keep a reference to the snapshot vnode until it gets removed from the
snapshot list.
 1.100 12-Dec-2010  hannken syncsnap: Use bbusy() to take a buffer from v_dirtyblkhd.
 1.99 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.98 02-Jun-2010  hannken Initialize the initial snap block list's count.

From Antti Kantee <pooka@netbsd.org>.
 1.97 15-Oct-2009  hannken branches: 1.97.2; 1.97.4;
No longer abuse TAILQ internal data.
 1.96 13-Oct-2009  hannken Fix a deadlock where fscow_disestablish() blocks because outstanding
copy-on-write operations wait for si_snaplock.
 1.95 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.94 18-Mar-2009  cegger bcopy -> memcpy
 1.93 18-Mar-2009  cegger bzero -> memset
 1.92 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.91 11-Jan-2009  christos branches: 1.91.2;
merge christos-time_t
 1.90 03-Jan-2009  hannken Remove superfluous "vp->v_vnlock = &vp->v_lock".

Observed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.89 19-Dec-2008  hannken Restore a line removed by mistake with the last commit.

Should fix PR 40225 panic: indiracct: missing indir.
 1.88 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.87 07-Dec-2008  hannken ffs_copyonwrite(): Only use si_snapblklist if it is already allocated.

ffs_snapshot_read(): Use IO_ALTSEMANTICS to allow reading a snapshot vnode
beyond file system size. Needed to read the snapblklist
on mount.

Persistent snapshots work again.

Should fix PR kern/37425: fss_snapshot_mount panic during fsck.
 1.86 07-Dec-2008  hannken Revert previous -- ALL reads are from kernel space.

Still open: PR kern/37425: fss_snapshot_mount panic during fsck.
 1.85 07-Dec-2008  hannken ffs_copyonwrite(): Only use si_snapblklist if it is already allocated.
ffs_snapshot_read(): Allow the kernel to read beyond file system size.

Persistent snapshots work again.

Should fix PR kern/37425: fss_snapshot_mount panic during fsck.
 1.84 06-Dec-2008  joerg Split ffs_freefile into a frontend for normal cylinder group and for
snapshot use. Adjust ffs_blkfree_common to get the fs instance passed
in, the original commit didn't account blocks in the snapshots
correctly. Assert that ffs_blkfree is used with the primary fs instance
and that ffs_checkfreefile is only used for snapshots. Move the bdwrite
from ffs_blkfree_common into the caller for symmetry. This creates a
redundant write of unmodified data for ffs_blkfree_snap if a double free
of a block happens.

Reviewed and tested by hannken@.
 1.83 01-Dec-2008  joerg ffs_blkfree is used in two different ways. The normal usage is to free a
block in the cylinder groups of the filesystem. The other user is the
snapshot code, which wants to modify the copied cylinder groups. Use
different frontends to distinguish the cases in preparation for fine
grained locking for cylinder groups.
 1.82 23-Oct-2008  hannken branches: 1.82.2; 1.82.4;
Correct previous.
- Count frags, not blocks to get the file system size.
- Cannot use blksize() here, it depends on vnode size.
- Correctly update xfersize on short reads.
 1.81 23-Oct-2008  hannken When computing the requests hard limit in ffs_snapshot_read()
use the file system size, not the size of the snapshot vnode.
 1.80 08-Sep-2008  hannken Adjust some WAPBL transactions:
- Put transaction inside cgaccount() to simplify caller.
- No vget() / vrele() inside a transaction.
 1.79 02-Sep-2008  hannken Ffs_snapshot() has become a huge monster over the time. Break it into
helper functions to enhance readability. Adjust comments to reality
and test the main error paths.

While here, expand and remove the last FreeBSD->NetBSD conversion macros.

No functional change intended.
 1.78 25-Aug-2008  hannken Sync the just created snapshot to disk.

Invalidate short ( < fs_bsize ) buffers. We will always read full
size buffers later.

Should fix PR #39402
 1.77 24-Aug-2008  hannken Add missing vput() for logvp.

Fixes PR #39400
 1.76 24-Aug-2008  hannken Merge the _ufs1 and _ufs2 variants of the expunge and accounting functions.
Remove some unneeded UFS_FSNEEDSWAP().

Saves ~250 lines of redundant code.
 1.75 22-Aug-2008  hannken Add snapshot support for logging ffs file systems.

- Add UFS_WAPBL_BEGIN() / UFS_WAPBL_END() where needed.

- Expunge WAPBL log inodes from snapshots.

- Ffs_copyonwrite() and ffs_snapblkfree() must run inside a WAPBL transaction.

- Add ffs_gop_write() as a wrapper around genfs_gop_write() that makes sure
genfs_gop_write() gets always called inside a WAPBL transaction.

- Add VOP_PUTPAGES() flag PGO_JOURNALLOCKED to tag calls to VOP_PUTPAGES()
inside a WAPBL transaction.

Reviewed by: Simon Burge <simonb@netbsd.org>, Greg Oster <oster@netbsd.org>

PGO_JOURNALLOCKED / ffs_gop_write() part presented on tech-kern@.
 1.74 12-Aug-2008  hannken Deny read/write access to snapshot vnodes. We use fss(4) to read from
snapshots. With this policy in place:

- Separate the snapshot vnode lock from the snapshot common lock.
Snapshots no longer need recursive vnode locks.

- Use a mutex (si_snaplock) to serialize creation, deletion, reading and
writing of snapshots.

- Move ffs_read() for snapshots into ffs_snapshot.c.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>

While here change ffs_copyonwrite() to fail requests from pagedaemon that need
to copy-on-write.
 1.73 31-Jul-2008  hannken Ffs snapshots don't work (yet) with WAPBL:
- no snapshot creation on logging file systems.
- refuse to mount logging file systems with persistent snapshots.

Ok: Simon Burge <simonb@netbsd.org>
 1.72 30-Jul-2008  hannken ffs_snapshot():
Release allocated indir blocks on non-softdep file systems instead
of writing them twice.
It is sufficient to clean dirty data pages to avoid UBC inconsistencies.

ffs_snapblkfree() and wrsnapblk():
If a snapshots effective link count is zero there is no need
to use synchronous writes.

ffs_copyonwrite():
Defer locking the snapshots until there is a need to copy the block.

wrsnapblk():
Use vn_rdwr() instead of bwrite() to write to the snapshots.
 1.71 15-Jul-2008  hannken expunge_ufs*(): Use the buffer cache to update the inodes on the snapshot like
the rest of snapshot creation does.
 1.70 17-Jun-2008  reinoud branches: 1.70.2;
Mark a buffer `busy` in getnewbuf() when it came from the pool_cache since
its not on a free list.

Also change buf_init() to not automatically mark buffers `busy' since this
only makes sense for bufcache buffers.

Mark all buf_init'd buffers 'busy' on the places where they ought to be
flagged as such to not confuse the buffer cache.

Fixes PR 38923.
 1.69 03-Jun-2008  hannken branches: 1.69.2;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.68 29-May-2008  hannken ffs_copyonwrite(): stop abusing ffs_balloc() to get a block address.
Use ufs_getlbns()/bread() instead.
Saves some reads and removes deep recursion with possible deadlock
when ffs_balloc() runs copy-on-write on the buffer returned.
 1.67 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.66 17-Apr-2008  hannken branches: 1.66.2; 1.66.4; 1.66.6;
Replace get/setspecific with a void pointer in struct ufsmount. Use explicit
initialization/finalization of snapshot private data on creation/deletion
of struct ufsmount.
Snapshot mounts no longer may fail silently because kmem_alloc() fails.

Welcome to 4.99.60

Ok: Andrew Doran <ad@netbsd.org>
 1.65 30-Jan-2008  hannken branches: 1.65.6; 1.65.8;
Make it work after lockmgr -> vlockmgr conversion:

- Initialize si_vnlock in si_mount_init().
- Also initialize vl_recursecnt to zero.
- Destroy it only in si_mount_dtor().
- Simplify the v_lock <-> si_vnlock exchange.
- Don't abuse the overall error variable for LK_NOWAIT errors.
- ffs_snapremove: release the vnode one instead of three times.
 1.64 30-Jan-2008  ad Replace use of LK_SLEEPFAIL.
 1.63 30-Jan-2008  ad PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.62 30-Jan-2008  ad Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.61 28-Jan-2008  hannken - Always destroy si_vnlock after use.
- Take care of vnodes without file system data.
 1.60 24-Jan-2008  hannken si_mount_dtor(): destroy si_vnlock before free.
 1.59 24-Jan-2008  hannken Fix a typo from the vmlocking2 merge: vmark() the right vnode.
 1.58 03-Jan-2008  pooka valloc -> vnalloc, vfree -> vnfree
Avoids collision with userland valloc(3).

no functional change
ad ok
 1.57 02-Jan-2008  ad Merge vmlocking2 to head.
 1.56 08-Dec-2007  pooka branches: 1.56.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.55 02-Dec-2007  hannken branches: 1.55.2;
Fscow_run(): add a flag "bool data_valid" to note still valid data.
Buffers run through copy-on-write are marked B_COWDONE. This condition
is valid until the buffer has run through bwrite() and gets cleared from
biodone().

Welcome to 4.99.39.

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.54 26-Nov-2007  pooka Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.53 10-Oct-2007  ad branches: 1.53.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.52 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.51 07-Oct-2007  hannken Update the file system copy-on-write handler.

- Instead of hooking the handler on the specdev of a mounted file system
hook directly on the `struct mount'.

- Rename from `vn_cow_*' to `fscow_*' and move to `kern/vfs_trans.c'. Use
`mount_*specific' instead of clobbering `struct mount' or `struct specinfo'.

- Replace the hand-made reader/writer lock with a krwlock.

- Keep `vn_cow_*' functions and mark as obsolete.

- Welcome to NetBSD 4.99.32 - `struct specinfo' changed size.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.50 21-Aug-2007  hannken branches: 1.50.2; 1.50.4;
Modify ffs_lock() to take care for changed v_vnlock. Snapshots do not need
transferlockers() anymore.

From FreeBSD ffs_vnops.c Rev. 1.159

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.49 18-Aug-2007  hannken - Use a mutex to protect snapinfo.
- Move the snapshot lock to snapinfo.
- ffs_snapblkfree(),ffs_copyonwrite(): replace lockmgr() with VOP_LOCK().
 1.48 18-Aug-2007  hannken Expunge traces of unlinked snapshot files when making a new snapshot.

From FreeBSD Rev. 1.123
 1.47 09-Aug-2007  hannken Move snapshot per-mount data from struct ufsmount to mount specific data.
No functional changes.

Welcome to 4.99.28 (struct ufsmount changed size)
 1.46 12-Jul-2007  hannken branches: 1.46.2; 1.46.6;
ffs_snapshot_mount: No persistent snapshots on an Apple UFS file system.

From Thor Lancelot Simon <tls@netbsd.org>
 1.45 10-Jul-2007  hannken Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.44 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.43 04-Mar-2007  christos branches: 1.43.2; 1.43.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.42 16-Feb-2007  hannken branches: 1.42.2;
Make fstrans(9) the default helper for file system suspension.
Replaces the now obsolete vn_start_write()/vn_finished_write().
 1.41 09-Feb-2007  ad Merge newlock2 to head.
 1.40 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.39 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.38 02-Dec-2006  hannken On snapshot creation be sure the snapshot vnode has valid quota information.

Fixes PR kern/35121
 1.37 16-Nov-2006  christos branches: 1.37.2;
ifdef out an unused function if !FFS_NO_SNAPSHOT
 1.36 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.35 25-Oct-2006  reinoud Revisit mnt_vnodelist TAILQ patch. Remove all suspicious TAILQ_FOREACH()
loops where vnodes can get removed or added during the loops. This could
lead to panic's on unmount since nodes are skipped or otherwise
TAILQ_NEXT(0xdeadbeef, ...) was dereferenced.
 1.34 20-Oct-2006  reinoud Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
 1.33 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.32 29-Sep-2006  christos Coverity CID 2949: comment out dead code (from Arnaud Lacombe)
 1.31 23-Jul-2006  ad branches: 1.31.4; 1.31.6;
Use the LWP cached credentials where sane.
 1.30 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.29 14-May-2006  elad branches: 1.29.2;
integrate kauth.
 1.28 18-Apr-2006  christos Coverity CID 746: Remove dead code. lbn >= NDADDR is mutually exclusive to
snapshot_locked == 0.
 1.27 10-Apr-2006  bouyer Revert previous; I mixed bpp and *bpp when reading ffs_balloc_ufs1().
ffs_balloc() will always allocate a new buffer or leave it as NULL,
so coverity is wrong here, we're not using a freed argument.
 1.26 10-Apr-2006  bouyer If we brelse ibp, set ibp to NULL, to avoid reusing it later in balloc()
or in our code at the next iteration.
Coverity ID 2706
 1.25 17-Mar-2006  christos don't use MALLOC with a non-constant size; use malloc instead.
 1.24 04-Jan-2006  yamt branches: 1.24.2; 1.24.4; 1.24.6; 1.24.8; 1.24.10;
- add simple functions to allocate/free a buffer for i/o.
- make bufpool static.
 1.23 11-Dec-2005  christos branches: 1.23.2;
merge ktrace-lwp.
 1.22 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.21 26-Sep-2005  yamt branches: 1.21.2;
revert ffs_snapshot.c 1.20 because it's bogus. pointed by Simon Burge.
 1.20 26-Sep-2005  yamt always use nanotime rather than time.
it's bad to mix nanotime and time because it sometimes
make timestamps go backwards.
 1.19 19-Aug-2005  christos 64 bit inode changes.
 1.18 15-Jul-2005  thorpej Use ANSI function decls.
 1.17 29-May-2005  christos branches: 1.17.2;
- sprinkle const
- avoid shadow variables.
 1.16 25-May-2005  hannken - Use an empty snap block list to set the initial file size. Snapshot is
now valid from the beginning. No need to copy the last fs block two times.
- No need to allocate the cylinder group blocks twice.
- cgbuf -> sbbuf
 1.15 22-May-2005  hannken ffs/ffs_alloc.c:
- Add a missing ACTIVECG_CLR().

ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
newest snapshot gets removed. Fixes a rare snapshot corruption when using
more than one snapshot on a file system.

ufs/ufsmount.h:
- Make TAILQ_LAST() possible on member um_snapshots.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
 1.14 03-May-2005  hannken Fix last commit. The last block of the file system may have changed
even if the last cylinder group is not modified.
 1.13 24-Apr-2005  hannken Fix an inconsistency where the last block of the snapshot contains old data.

The last block of the file system is written to the snapshot before the
file system is suspended. If the last cylinder group is modified after
the file system is suspended the last block of the snapshot may contain
old data. So update this block again.
 1.12 21-Apr-2005  yamt don't assign to non-lvalue. found by gcc4.
 1.11 26-Feb-2005  perry branches: 1.11.2;
nuke trailing whitespace
 1.10 21-Feb-2005  hannken Make `options FFS_NO_SNAPSHOT' only disable snapshot creation
while not trashing existing snapshots.

Approved by: core@
 1.9 09-Feb-2005  hannken Fss device only checks read access to snapshot vode. On snapshot creation
check we are either super-user or owner of the snapshot vnode.
 1.8 18-Jan-2005  hannken branches: 1.8.2;
Protect calls to `ffs_*_swap' with `#ifdef FFS_EI'.
 1.7 17-Sep-2004  skrll branches: 1.7.4;
There's no need to pass a proc value when using UIO_SYSSPACE with
vn_rdwr(9) and uiomove(9).

OK'd by Jason Thorpe
 1.6 29-Aug-2004  hannken While creating a snapshot inodes must be freed from the
snapshot, not from the file system.
ffs_freefile() needs explicit "fs" and "devvp" arguments.
 1.5 30-Jun-2004  hannken branches: 1.5.2;
When we expunge an unreferenced file from a snapshot its size may be zero.
 1.4 20-Jun-2004  hannken - Add flag L_COWINPROGRESS to struct lwp to avoid recursion when
doing copy-on-write.

- Change VFS_SNAPSHOT() to return the snapshot vnode locked.

- Make the IO path for copy-on-write and snapshot-read more lightweight.
Avoids deadlocks where vn_rdwr(...READ...) has a shared lock and needs
to copy-on-write.
Avoids deadlocks/panics where to clean pages the copy-on-write needs
to allocate pages for its VOP_PUTPAGES().

L_COWINPROGRESS part approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.3 31-May-2004  hannken Once all block address modifications are done invalidate and
free all pages from the snapshot vnode.
 1.2 26-May-2004  hannken Make it compile without option FFS_EI.
 1.1 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.5.2.11 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.5.2.10 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.5.2.9 15-Feb-2005  skrll Adapt to branch.
 1.5.2.8 15-Feb-2005  skrll Sync with HEAD.
 1.5.2.7 04-Feb-2005  skrll Sync with HEAD.
 1.5.2.6 24-Jan-2005  skrll Sync with HEAD.
 1.5.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.5.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.5.2.3 03-Sep-2004  skrll Sync with HEAD
 1.5.2.2 03-Aug-2004  skrll Sync with HEAD
 1.5.2.1 30-Jun-2004  skrll file ffs_snapshot.c was added on branch ktrace-lwp on 2004-08-03 10:56:49 +0000
 1.7.4.1 29-Apr-2005  kent sync with -current
 1.8.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.8.2.1 12-Feb-2005  yamt sync with head.
 1.11.2.8 06-Dec-2006  tron Pull up following revision(s) (requested by hannken in ticket #1598):
sys/ufs/ffs/ffs_snapshot.c: revision 1.38
On snapshot creation be sure the snapshot vnode has valid quota information.
Fixes PR kern/35121
 1.11.2.7 28-May-2005  tron branches: 1.11.2.7.2; 1.11.2.7.4;
Pull up revision 1.16 (requested by hannken in ticket #334):
- Use an empty snap block list to set the initial file size. Snapshot is
now valid from the beginning. No need to copy the last fs block two times.
- No need to allocate the cylinder group blocks twice.
- cgbuf -> sbbuf
 1.11.2.6 28-May-2005  tron Pull up revision 1.15 (requested by hannken in ticket #334):
ffs/ffs_alloc.c:
- Add a missing ACTIVECG_CLR().
ffs/ffs_snapshot.c:
- Use async/delayed writes for snapshot creation and sync/uncache these buffers
on end. Reduces the time the file system must be suspended.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
- Byte swap the list of preallocated blocks on read/write instead of access.
- Always keep this list on ip->i_snapblklist so it may be rolled back when the
newest snapshot gets removed. Fixes a rare snapshot corruption when using
more than one snapshot on a file system.
ufs/ufsmount.h:
- Make TAILQ_LAST() possible on member um_snapshots.
- Remove um_snaplistsize. Was a duplicate of um_snapblklist[0].
 1.11.2.5 03-May-2005  tron Pull up revision 1.14 (requested by hannken in ticket #244):
Fix last commit. The last block of the file system may have changed
even if the last cylinder group is not modified.
 1.11.2.4 03-May-2005  tron Restore file which was deleted by accident because of CVS glitch.
 1.11.2.3 03-May-2005  tron Pull up revision 1.14 (requested by hannken in ticket #244):
Fix last commit. The last block of the file system may have changed
even if the last cylinder group is not modified.
 1.11.2.2 25-Apr-2005  tron Pull up revision 1.13 (requested by hannken in ticket #197):
Fix an inconsistency where the last block of the snapshot contains old data.
The last block of the file system is written to the snapshot before the
file system is suspended. If the last cylinder group is modified after
the file system is suspended the last block of the snapshot may contain
old data. So update this block again.
 1.11.2.1 25-Apr-2005  tron Pull up revision 1.12 (requested by hannken in ticket #197):
don't assign to non-lvalue. found by gcc4.
 1.11.2.7.4.1 06-Dec-2006  tron Pull up following revision(s) (requested by hannken in ticket #1598):
sys/ufs/ffs/ffs_snapshot.c: revision 1.38
On snapshot creation be sure the snapshot vnode has valid quota information.
Fixes PR kern/35121
 1.11.2.7.2.1 06-Dec-2006  tron Pull up following revision(s) (requested by hannken in ticket #1598):
sys/ufs/ffs/ffs_snapshot.c: revision 1.38
On snapshot creation be sure the snapshot vnode has valid quota information.
Fixes PR kern/35121
 1.17.2.8 04-Feb-2008  yamt sync with head.
 1.17.2.7 21-Jan-2008  yamt sync with head
 1.17.2.6 07-Dec-2007  yamt sync with head
 1.17.2.5 27-Oct-2007  yamt sync with head.
 1.17.2.4 03-Sep-2007  yamt sync with head.
 1.17.2.3 26-Feb-2007  yamt sync with head.
 1.17.2.2 30-Dec-2006  yamt sync with head.
 1.17.2.1 21-Jun-2006  yamt sync with head.
 1.21.2.2 29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.21.2.1 20-Oct-2005  yamt adapt ufs.
 1.23.2.1 15-Jan-2006  yamt sync with head.
 1.24.10.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.24.10.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.24.8.5 11-May-2006  elad sync with head
 1.24.8.4 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.24.8.3 19-Apr-2006  elad sync with head.
 1.24.8.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.24.8.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.24.6.4 11-Aug-2006  yamt sync with head
 1.24.6.3 26-Jun-2006  yamt sync with head.
 1.24.6.2 24-May-2006  yamt sync with head.
 1.24.6.1 01-Apr-2006  yamt sync with head.
 1.24.4.3 01-Jun-2006  kardel Sync with head.
 1.24.4.2 22-Apr-2006  simonb Sync with head.
 1.24.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.24.2.1 09-Sep-2006  rpaulo sync with head
 1.29.2.1 19-Jun-2006  chap Sync with head.
 1.31.6.2 10-Dec-2006  yamt sync with head.
 1.31.6.1 22-Oct-2006  yamt sync with head
 1.31.4.4 01-Feb-2007  ad Sync with head.
 1.31.4.3 12-Jan-2007  ad Sync with head.
 1.31.4.2 29-Dec-2006  ad Checkpoint work in progress.
 1.31.4.1 18-Nov-2006  ad Sync with head.
 1.37.2.1 06-Dec-2006  tron Pull up following revision(s) (requested by hannken in ticket #252):
sys/ufs/ffs/ffs_snapshot.c: revision 1.38
On snapshot creation be sure the snapshot vnode has valid quota information.
Fixes PR kern/35121
 1.42.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.43.4.1 11-Jul-2007  mjf Sync with head.
 1.43.2.14 29-Oct-2007  ad Remove unused label.
 1.43.2.13 28-Oct-2007  ad Fix up mnt_vnodelist handling.
 1.43.2.12 09-Oct-2007  ad Sync with head.
 1.43.2.11 09-Oct-2007  ad Sync with head.
 1.43.2.10 30-Aug-2007  ad bufcache_lock is sufficient to inspect v_dirtyblkhd, vp->v_interlock is only
needed to modify.
 1.43.2.9 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.43.2.8 20-Aug-2007  ad Sync with HEAD.
 1.43.2.7 15-Jul-2007  ad Sync with head.
 1.43.2.6 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.43.2.5 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.43.2.4 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.43.2.3 13-Apr-2007  ad - Make the devsw interface MP safe, and add some comments.
- Allow individual block/character drivers to be marked MP safe.
- Provide wrappers around the device methods that look up the
device, returning ENXIO if it's not found, and acquire the
kernel lock if needed.
 1.43.2.2 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.43.2.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.46.6.6 09-Dec-2007  jmcneill Sync with HEAD.
 1.46.6.5 03-Dec-2007  joerg Sync with HEAD.
 1.46.6.4 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.46.6.3 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.46.6.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.46.6.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.46.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.46.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.50.4.1 14-Oct-2007  yamt sync with head.
 1.50.2.3 23-Mar-2008  matt sync with HEAD
 1.50.2.2 09-Jan-2008  matt sync with HEAD
 1.50.2.1 06-Nov-2007  matt sync with HEAD
 1.53.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.53.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.53.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.55.2.2 26-Dec-2007  ad Sync with head.
 1.55.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.56.4.2 08-Jan-2008  bouyer Sync with HEAD
 1.56.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.65.8.5 04-Jan-2009  christos merge diffs.
 1.65.8.4 27-Dec-2008  christos merge with head.
 1.65.8.3 01-Nov-2008  christos catch up with changes in head.
 1.65.8.2 01-Nov-2008  christos Sync with head.
 1.65.8.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.65.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.65.6.4 28-Sep-2008  mjf Sync with HEAD.
 1.65.6.3 29-Jun-2008  mjf Sync with HEAD.
 1.65.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.65.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.66.6.3 24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.66.6.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.66.6.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.66.4.3 11-Aug-2010  yamt sync with head.
 1.66.4.2 11-Mar-2010  yamt sync with head
 1.66.4.1 04-May-2009  yamt sync with head.
 1.66.2.2 04-Jun-2008  yamt sync with head
 1.66.2.1 18-May-2008  yamt sync with head.
 1.69.2.3 31-Jul-2008  simonb Sync with head.
 1.69.2.2 18-Jul-2008  simonb Sync with head.
 1.69.2.1 18-Jun-2008  simonb Sync with head.
 1.70.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.70.2.1 19-Oct-2008  haad Sync with HEAD.
 1.82.4.4 18-Jun-2011  bouyer Pull up following revision(s) (requested by hannken in ticket #1627):
sys/kern/vfs_wapbl.c: revisions 1.41-1.42
sbin/dump/snapshot.c: revisions 1.6 (patch)
share/man/man4/fss.4: revisions 1.15 (patch)
sys/dev/fss.c: revisions 1.73 (patch)
sys/dev/fssvar.h: revisions 1.25
usr.sbin/fssconfig/fssconfig.c: revisions 1.7
sys/ufs/ffs/ffs_balloc.c: revisions 1.54
sys/ufs/ffs/ffs_snapshot.c: revisions 1.90, 1.98, 1.100-1.101, 1.103-1.110, 1.111, 1.112-1.115 (patch)

- Try to keep snapshot indirect blocks contiguous. This speeds up snapshot
creation by a factor of ~3 and reduces the file system suspension time by
a factor of ~5.

- Refine the scope of WAPBL transactions and the limit for deallocations in
one transaction so we should no longer get a "wapbl_flush: current
transaction too big to flush" panic when creating or removing snapshots
on larger logging disks.

- fss(4): Allow FSSIOCSET to set the initial flags. Add a new flag
"FSS_UNLINK_ON_CREATE" to unlink the backing store before the snapshot
gets created. With this change dump(8) no longer dumps the zero-sized,
but named snapshot it is working on.
 1.82.4.3 28-Mar-2010  snj branches: 1.82.4.3.4;
Pull up following revision(s) (requested by hannken in ticket #1345):
sys/ufs/ffs/ffs_snapshot.c: revision 1.97
No longer abuse TAILQ internal data.
 1.82.4.2 28-Mar-2010  snj Pull up following revision(s) (requested by hannken in ticket #1345):
sys/ufs/ffs/ffs_snapshot.c: revision 1.96
Fix a deadlock where fscow_disestablish() blocks because outstanding
copy-on-write operations wait for si_snaplock.
 1.82.4.1 10-Dec-2008  snj branches: 1.82.4.1.4;
Pull up following revision(s) (requested by hannken in ticket #169):
sys/ufs/ffs/ffs_snapshot.c: revision 1.87
ffs_copyonwrite(): Only use si_snapblklist if it is already allocated.
ffs_snapshot_read(): Use IO_ALTSEMANTICS to allow reading a snapshot vnode
beyond file system size. Needed to read the snapblklist
on mount.
Persistent snapshots work again.
Should fix PR kern/37425: fss_snapshot_mount panic during fsck.
 1.82.4.3.4.1 07-Jan-2011  matt Quiet gcc.
 1.82.4.1.4.1 21-Apr-2010  matt sync to netbsd-5
 1.82.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.82.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.82.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.91.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.97.4.5 31-May-2011  rmind sync with head
 1.97.4.4 21-Apr-2011  rmind sync with head
 1.97.4.3 05-Mar-2011  rmind sync with head
 1.97.4.2 03-Jul-2010  rmind sync with head
 1.97.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.97.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.102.4.8 05-Mar-2011  bouyer Sync with HEAD
 1.102.4.7 18-Feb-2011  bouyer Add a new inode flag, SF_SNAPINVAL, to be set on SF_SNAPSHOT inodes when
the snapshot is invalid.
Set SF_SNAPSHOT | SF_SNAPINVAL early when initializing a snapshot indode,
so that quota are bypassed for allocations on this inode.
Set SF_SNAPSHOT | SF_SNAPINVAL (instead of clearing SF_SNAPSHOT) when
expuge()ing a snapshot inode, so that userland tools working on the
snapshot (e.g. fsck or dump) can properly handle this inode.

The main point at this time is to have fsck_ffs -X properly compute quotas;
as a bonus persistent snapshots files won't show up in a dump(8) from a
snapshot.

This may also help speeding up taking snapshots, by bypassing expuge()
for snapshot inodes completely (but this needs more thoughs).


Briefly discussed with hannken@ in private mail.
 1.102.4.6 18-Feb-2011  bouyer Sync with HEAD
 1.102.4.5 17-Feb-2011  bouyer Remove comment that should not be there
 1.102.4.4 17-Feb-2011  bouyer Sync with HEAD
 1.102.4.3 17-Feb-2011  bouyer Do not adjust quota when a snapshot inode is cleared in a snapshot view.
 1.102.4.2 12-Feb-2011  bouyer Don't count snapshot files in inode quota too.
At umount time, chk?q may be called after quota have been shutdown,
as there is a final vflush pass after quota?_umount(); so skip quota
checks if the quota vnode is not there any more.
 1.102.4.1 12-Feb-2011  bouyer Do not update disk quotas for snapshot inodes, as this may require a
write to the same filesystem, which will trigger a copy on write,
which will trigger another update to the same block.
Set SF_SNAPSHOT just after truncating the snapshot inode, so that this
inode always account for 0 blocks in quotas.
 1.102.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.115.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.118.6.1 05-Apr-2012  mrg sync to latest -current.
 1.118.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.118.2.2 23-Jan-2013  yamt sync with head
 1.118.2.1 17-Apr-2012  yamt sync with head
 1.119.2.4 03-Dec-2017  jdolecek update from HEAD
 1.119.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.119.2.2 23-Jun-2013  tls resync from head
 1.119.2.1 25-Feb-2013  tls resync with head
 1.127.2.1 18-May-2014  rmind sync with head
 1.133.2.1 10-Aug-2014  tls Rebase.
 1.137.2.4 28-Aug-2017  skrll Sync with HEAD
 1.137.2.3 05-Dec-2016  skrll Sync with HEAD
 1.137.2.2 22-Sep-2015  skrll Sync with HEAD
 1.137.2.1 06-Apr-2015  skrll Sync with HEAD
 1.140.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.140.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.140.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.143.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.149.16.2 29-Feb-2020  ad Sync with head.
 1.149.16.1 17-Jan-2020  ad Sync with head.
 1.149.14.1 13-May-2023  martin Pull up following revision(s) (requested by chs in ticket #1633):

sys/ufs/ffs/ffs_snapshot.c: revision 1.155

ffs: apply the remaining ffs_snapshot.c part of this FreeBSD commit:
commit 364ed814e7285c8216d8a201d3ab3674eb34ce29
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Dec 9 21:24:00 2004 +0000
Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.
Submitted by: Henry Whincup <henry@jot.to>
MFC after: 1 week

all the other changes in that commit were applied previously by others:
- sborrill commmitted ffs_alloc.c rev 1.123 in 2009
- simonb committed ffs_alloc.c rev 1.110 in 2008
- the ffs_clusteralloc() part is not needed because we no longer have
that function.

fixes PR 57307
 1.149.10.2 21-Apr-2020  martin Sync with HEAD
 1.149.10.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.151.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.154.4.1 13-May-2023  martin Pull up following revision(s) (requested by chs in ticket #165):

sys/ufs/ffs/ffs_snapshot.c: revision 1.155

ffs: apply the remaining ffs_snapshot.c part of this FreeBSD commit:
commit 364ed814e7285c8216d8a201d3ab3674eb34ce29
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Dec 9 21:24:00 2004 +0000
Fixes a bug that caused UFS2 filesystems bigger than 2TB to
prematurely report that they were full and/or to panic the kernel
with the message ``ffs_clusteralloc: allocated out of group''.
Submitted by: Henry Whincup <henry@jot.to>
MFC after: 1 week

all the other changes in that commit were applied previously by others:
- sborrill commmitted ffs_alloc.c rev 1.123 in 2009
- simonb committed ffs_alloc.c rev 1.110 in 2008
- the ffs_clusteralloc() part is not needed because we no longer have
that function.

fixes PR 57307
 1.2 31-Jan-2005  hannken No longer needed. Ffs snapshots are enabled by default.
 1.1 25-May-2004  hannken branches: 1.1.2; 1.1.6; 1.1.8;
Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.1.8.1 12-Feb-2005  yamt sync with head.
 1.1.6.1 29-Apr-2005  kent sync with -current
 1.1.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.1.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.2 03-Aug-2004  skrll Sync with HEAD
 1.1.2.1 25-May-2004  skrll file ffs_snapshot.stub.c was added on branch ktrace-lwp on 2004-08-03 10:56:49 +0000
 1.2 21-Feb-2005  hannken Make `options FFS_NO_SNAPSHOT' only disable snapshot creation
while not trashing existing snapshots.

Approved by: core@
 1.1 10-Feb-2005  dsl branches: 1.1.2; 1.1.4;
Add a stub file so that snapshot support can be compiled out.
Will allow INSTALL_TINY to fit back in its designated space.
Since the calling code doesn't allow a snapshot mount to fail, this code
will output a warning and delete any snapshots it finds.
This only happend on rw mounts - snapshots don't seem to be created
when mounting ro.
The whole way the snapshots gets mounted is a PITA anyway, the superblock
'last mounted' time should be used to validate that the fs hasn't been
mounted elsewhere.
 1.1.4.3 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.1.4.2 15-Feb-2005  skrll Sync with HEAD.
 1.1.4.1 10-Feb-2005  skrll file ffs_snapshot_stub.c was added on branch ktrace-lwp on 2005-02-15 21:34:02 +0000
 1.1.2.3 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.1.2.2 12-Feb-2005  yamt sync with head.
 1.1.2.1 10-Feb-2005  yamt file ffs_snapshot_stub.c was added on branch yamt-km on 2005-02-12 18:17:56 +0000
 1.117 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.116 06-Dec-2008  joerg branches: 1.116.4;
Split ffs_freefile into a frontend for normal cylinder group and for
snapshot use. Adjust ffs_blkfree_common to get the fs instance passed
in, the original commit didn't account blocks in the snapshots
correctly. Assert that ffs_blkfree is used with the primary fs instance
and that ffs_checkfreefile is only used for snapshots. Move the bdwrite
from ffs_blkfree_common into the caller for symmetry. This creates a
redundant write of unmodified data for ffs_blkfree_snap if a double free
of a block happens.

Reviewed and tested by hannken@.
 1.115 03-Jun-2008  hannken branches: 1.115.4; 1.115.6;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.114 31-May-2008  ad Put a TNF copyright on it.
 1.113 31-May-2008  ad XXX softdep:

If the number of deletes in progress is getting too high, newdirrem()
requests the syncer to flush faster, and in some cases will block to
prevent deletes accumulating faster than the disk can service them.

The syncer will try to lock vnodes that the remover holds locked, leading
to the syncer and remover proceeding in lockstep and making very little
overall forward progress.

Put a hook into ufs_rmdir() and ufs_remove() so that the softdep code
can pace itself without holding vnode locks if the number of deletes is
running out of control.
 1.112 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.111 05-May-2008  ad branches: 1.111.2;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.110 29-Apr-2008  ad PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.109 11-Apr-2008  ad branches: 1.109.2; 1.109.4;
newdirrem: if the number of deletes in progress is getting too high, start
pushing the syncer before considering rate limiting the deletes. We hold
vnodes locked and it's likely that the syncer will try to lock them while
flushing, leading to the syncer and remover proceeding in lockstep and
making very little forward progress. XXX this is not a solution.
 1.108 20-Feb-2008  matt branches: 1.108.6;
Merge all the *different* definitions of bufqueues into one common one.
 1.107 15-Feb-2008  ad Give bbusy() an interlock argument. If the we need to wait for the buffer,
the interlock is dropped and reacquired when awoken. This allows for
busying buffers attached to a list that is not locked by bufcache_lock.
 1.106 12-Jan-2008  ad Initialize caches at IPL_SOFTBIO (not IPL_NONE) so that we are allocating
from kmem_map.
 1.105 07-Jan-2008  ad Fix 'panic: softdep_update_inodeblock: update failed'.
 1.104 07-Jan-2008  tnn softdep_freefile: don't acquire ufsmount lock twice.
 1.103 02-Jan-2008  ad Merge vmlocking2 to head.
 1.102 08-Dec-2007  pooka branches: 1.102.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.101 26-Nov-2007  pooka branches: 1.101.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.100 07-Nov-2007  ad Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.99 10-Oct-2007  ad branches: 1.99.2; 1.99.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.98 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.97 01-Sep-2007  pooka branches: 1.97.2;
Make bioops a pointer and point it to the softdeps struct in softdep
init. Decouples "options SOFTDEP" from the main kernel and ffs code.
 1.96 29-Jul-2007  ad branches: 1.96.4; 1.96.6; 1.96.8;
It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.95 10-Jul-2007  hannken branches: 1.95.2;
Restore the special lkt_held handling for softdep_disk_write_complete().
No more panics 'worklist_remove: lock not held' on DEBUG kernels.

Ok Andrew Doran <ad@netbsd.org>
 1.94 09-Jul-2007  ad Fix build with DEBUG.
 1.93 09-Jul-2007  ad We got LWPs years ago..
 1.92 09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.91 30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.90 07-May-2007  yamt flush_inodedep_deps: fix access after free. PR/29724.
 1.89 08-Apr-2007  hannken Remove now obsolete vn_start_write() and vn_finished_write() and
corresponding flags.

Revert softdep_trackbufs() to its state before vn_start_write() was added.

Remove from struct mount now unneeded flags IMNT_SUSPEND* and
members mnt_writeopcountupper, mnt_writeopcountlower and mnt_leaf.

Welcome to 4.99.17
 1.88 07-Apr-2007  hannken Remove calls to now obsolete vn_start_write() and vn_finished_write().
 1.87 12-Mar-2007  ad branches: 1.87.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.86 04-Mar-2007  christos branches: 1.86.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.85 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.84 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.83 17-Feb-2007  pavel Change the process/lwp flags seen by userland via sysctl back to the
P_*/L_* naming convention, and rename the in-kernel flags to avoid
conflict. (P_ -> PK_, L_ -> LW_ ). Add back the (now unused) LSDEAD
constant.

Restores source compatibility with pre-newlock2 tools like ps or top.

Reviewed by Andrew Doran.
 1.82 09-Feb-2007  ad branches: 1.82.2;
Merge newlock2 to head.
 1.81 16-Nov-2006  christos branches: 1.81.2; 1.81.4;
__unused removal on arguments; approved by core.
 1.80 24-Oct-2006  drochner import a fix from FreeBSD (rev.1.185):
After a rmdir()ed directory has been truncated, force an update of
the directory's inode after queuing the dirrem that will decrement
the parent directory's link count. This will force the update of
the parent directory's actual link to actually be scheduled. Without
this change the parent directory's actual link count would not be
updated until ufs_inactive() cleared the inode of the newly removed
directory, which might be deferred indefinitely. ufs_inactive()
will not be called as long as any process holds a reference to the
removed directory, and ufs_inactive() will not clear the inode if
the link count is non-zero, which could be the result of an earlier
system crash.
[plus description about problems woth background fsck solved
by this; irrelevant to NetBSD]

For me, the good effect is at least that I'm getting less filesystem
inconsistencies after a crash.

Approved by christos quite a while ago.
 1.79 14-Oct-2006  yamt handle_workitem_freefrag/handle_workitem_freeblocks:
don't fake up inode/vnode pair.
 1.78 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.77 03-Oct-2006  christos Coverity CID 3690: Reverse INULL: Add KASSERT.
 1.76 23-Jul-2006  ad branches: 1.76.4; 1.76.6;
Use the LWP cached credentials where sane.
 1.75 12-Jun-2006  hannken softdep_sync_metadata: If vp is a block device it may have new I/O requests
posted for it even if the vnode is locked. This will deadlock with wmesg
"softgetdbuf" if it gets a BMSAFEMAP dependency as here we have "bp == nbp"
and try to get a buffer we already own.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.74 14-May-2006  elad branches: 1.74.2;
integrate kauth.
 1.73 24-Dec-2005  perry branches: 1.73.4; 1.73.6; 1.73.8; 1.73.10; 1.73.12;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.72 11-Dec-2005  christos merge ktrace-lwp.
 1.71 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.70 09-Sep-2005  yamt branches: 1.70.2;
- for pagecache dependency, track which page in the block
has been written or not individually by (ab)using b_resid
in pcbp as a bitmap.
- add a comment to explain why it's needed.

PR/15364. reviewed by Chuck Silvers.
 1.69 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.68 24-Aug-2005  yamt PRId64 -> ld in UVMHIST_LOG format strings.
 1.67 19-Aug-2005  christos 64 bit inode changes.
 1.66 30-May-2005  christos branches: 1.66.2;
rename delay because it is a function on sparc.
 1.65 29-May-2005  christos - sprinkle const
- avoid shadow variables.
 1.64 07-May-2005  hannken flush_inodedep_deps(): If softdep_lookupvp() returns NULL it means the
inode has been reclaimed. Skip the VOP_PUTPAGES() in this case.

Reviewed by: Chuck Silvers <chs@netbsd.org>
 1.63 26-Feb-2005  perry branches: 1.63.2;
nuke trailing whitespace
 1.62 25-Jan-2005  wrstuden Extend fsync_range(2) to support the FDISKSYNC flag, which requests
that the sync be propogated out through the disk drive caches.
 1.61 15-Dec-2004  mycroft branches: 1.61.2; 1.61.4;
Remove some unnecessary (int32_t) casts that would cause us to screw up the
top bit in block addresses.

Also, change some daddr_t->int32_t casts (mostly as arguments to ufs_rw32(),
where they would get promoted anyway) to u_int32_t.
 1.60 29-Aug-2004  hannken While creating a snapshot inodes must be freed from the
snapshot, not from the file system.
ffs_freefile() needs explicit "fs" and "devvp" arguments.
 1.59 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.58 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.57 11-Mar-2004  yamt reserve a MAXBSIZE-sized buffer for inodedeps for pagedaemon.

PR/24443.
 1.56 11-Mar-2004  yamt as we always replace whole buf in the case of indirdep,
simply changing b_data is enough. eliminate M_INDIRDEP.

PR/24443.
 1.55 10-Jan-2004  hannken Allow vfs_write_suspend() to wait if the file system is already
suspending.

Move vfs_write_suspend() and vfs_write_resume() from kern/vfs_vnops.c
to kern/vfs_subr.c.

Change vnode write gating in ufs/ffs/ffs_softdep.c (from FreeBSD).

When vnodes are throttled in softdep_trackbufs() check for
file system suspension every 10 msecs to avoid a deadlock.
 1.54 10-Jan-2004  hannken Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue.

Cleanup ffs_sync() which did not synchronously wait when MNT_WAIT
was specified. Clear the work queue when MNT_WAIT is specified.

Result is a clean on-disk file system after ffs_sync(.., MNT_WAIT, ..)

From FreeBSD.
 1.53 15-Oct-2003  hannken Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.52 14-Oct-2003  dbj add mnt_iflag field to struct mount for internal flags
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
 1.51 07-Sep-2003  yamt buffer cache mp locks.
 1.50 29-Jun-2003  fvdl branches: 1.50.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.49 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.48 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.47 15-May-2003  kristerw The C language does not permit statements of the form
(X ? Y : Z) = 0;
even though gcc handles this by a stupid extension.

Transform these to correct C.

Approved by fvdl.
 1.46 03-Apr-2003  fvdl FreeBSD revision 1.135:

When removing the last item from a non-empty worklist, the worklist
tail pointer must be updated.
 1.45 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.44 05-Feb-2003  pk Make the buffer cache code MP-safe.
 1.43 01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.42 26-Jan-2003  tsutsui More printf format cleanup to reduce casts.
 1.41 25-Jan-2003  tron Use PRId64 instead of hard coding "%lld" to fix build problems under
LP64 ports.
 1.40 25-Jan-2003  tron Fix printf() format strings problems caused by "daddr_t" change.
 1.39 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.38 01-Jan-2003  chs several bugs:
- move calls to softdep_setup_pagecache() (which can sleep to allocate
memory) outside the softdep lock.
- replace the softdep_flush_indir() hack (which tries to find another
vnode to fsync when we are holding lots of buffer-cache buffers locked
for long periods of time) with softdep_trackbufs() (which just kicks
the syncer and sleeps under the same circumstances). the former method
had a lock-ordering problem which would occasionally deadlock.
- relax the assertion in softdep_sync_metadata() which says that we should
never see D_ALLOCDIRECT deps for VREG vnodes. it's ok to see those
attached to indirect blocks.

also, there's no need to splbio() while allocating the buffer headers
to which pagecache dependencies are attached, so remove that.

fixes all the problems in PR 19288.
 1.37 30-Nov-2002  kristerw Softdep is mature enough that it shouldn't define DEBUG and DIAGNOSTIC
unconditionally.
 1.36 24-Nov-2002  scw Quell an uninitialised variable warning.
 1.35 27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.34 25-Aug-2002  thorpej Make nbuf, nswbuf, and bufpages unsigned. Make all operations on these
variables unsigned, and update places where their values are printed.
 1.33 05-Jul-2002  scw Cast pointers first to uintptr_t before casting to register_t.
On SH-5, sizeof(register_t) is always 8, even if sizeof(void *) is 4
as is the case when compiling for ILP32.
 1.32 18-Jun-2002  jdolecek clear_inodedeps(): use CIRCLEQ_FOREACH() appropriately
 1.31 18-Mar-2002  wiz branches: 1.31.4; 1.31.6;
Fix a typo, a KNF-nit, and simplify a printf format string.
 1.30 08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.29 22-Feb-2002  enami Record some page cache related information into ubchist.
 1.28 14-Feb-2002  wiz Fix two problems with softdep_typenames (missing entry, wrong boundary check).
Okayed by fvdl.
 1.27 10-Feb-2002  chs bring in the change from FreeBSD's rev. 1.107 of this file:

date: 2002/02/07 00:54:32; author: mckusick; state: Exp; lines: +10 -7
Occationally deleted files would hang around for hours or days
without being reclaimed. This bug was introduced in revision 1.95
dealing with filenames placed in newly allocated directory blocks,
thus is not present in 4.X systems. The bug is triggered when a
new entry is made in a directory after the data block containing
the original new entry has been written, but before the inode
that references the data block has been written.

Submitted by: Bill Fenner <fenner@research.att.com>

This should fix NetBSD PR 15531.
 1.26 18-Jan-2002  enami - For CIRCLEQ, comparing the loop variable against NULL doesn't make sense.
- Minor KNF while I'm here.

# This doesn't fix real problems though.
 1.25 16-Jan-2002  enami Fix typo which prevents diagnostic test from working.
 1.24 27-Dec-2001  fvdl Pull over one missed fix from FreeBSD wrt. running out of quota. Also
reshuffle some code a bit to make it look more similar (no functional
change).
 1.23 23-Dec-2001  fvdl Fix from FreeBSD that I missed: speed up handling of short-lived
files a bit.
 1.22 23-Dec-2001  chs process the delayed-free queue more often.
 1.21 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.20 08-Nov-2001  chs call VOP_PUTPAGES() directly for vnodes instead of
going through the UVM pager "put" vector.
 1.19 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.18 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.17 15-Sep-2001  chs branches: 1.17.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.16 15-Sep-2001  chs use pools for allocating most softdep datastructures. since we want to
allocate memory from kernel_map but some of the objects are freed from
interrupt context, we put objects on a queue instead of freeing them
immediately. then in softdep_process_worklist() (which is called at
least once per second from the syncer), we process that queue and
free all the objects. allocating from kernel_map instead of from kmem_map
allows us to have a much larger number of softdeps pending even in
configurations where kmem_map is relatively small.
 1.15 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.14 30-Aug-2001  chs branches: 1.14.2;
min() -> MIN() (on general principles)
 1.13 10-Jan-2001  chs branches: 1.13.2; 1.13.6;
attach the softdep pagecache pseudo-buffers to the inode
so we can find them quickly in the softdep truncate path.
 1.12 13-Dec-2000  mycroft Patch from Kirk McKusick to fix an ordering problem in softdep_setup_freeblks()
that could cause an inode to be reused prematurely (possibly resulting in the
file containing garbage blocks).
 1.11 13-Dec-2000  chs fix bookkeeping for page cache dependency buffers.
 1.10 11-Dec-2000  chs in flush_inodedep_deps(), drop the big softdep lock while flushing pages.
 1.9 27-Nov-2000  chs allow building without SOFTDEP by adding the pageiodone hook to bio_ops.
 1.8 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.7 08-Nov-2000  ad branches: 1.7.2;
Update for hashinit() change.
 1.6 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.

Implement range fsync for FFS. Note: not yet implemented for the
SOFTDEP case.
 1.5 15-Aug-2000  fvdl Do not call MALLOC with M_WAITOK while holding the "lock". Thanks to
Ethan Solomita for the reminder.

Mark the parent vnode lock as recursive while flushing pagedeps. XXX.
Should fix kern/10564.
 1.4 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.3 27-Jun-2000  pk We shouldn't be defining DEBUG and DIAGNOSTIC on our own; these may have
unwanted side-effects in the header files. For now, do the internal
#defines after including the headers.
 1.2 22-Jun-2000  fvdl branches: 1.2.2;
Moved here from gnu/sys/ufs/ffs
 1.1 19-Oct-1999  fvdl branches: 1.1.2;
file ffs_softdep.c was initially added on branch fvdl-softdep.
 1.1.2.6 03-Nov-1999  fvdl Give ufs_ihashget an extra argument: the flags passed to vget() for
locking. This way we can avoid locking against ourselves when
ufs_ihashget is called during the flushing of metadata. XXX

Also, comment out a VOP_FSYNC call that I think is now unneeded, and
put a diagnostic printf there to check if this still happens.
 1.1.2.5 26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.1.2.4 21-Oct-1999  fvdl Add workaround hacks to enable the softdep code to call getnewvnode()
when a filesystem is being unmounted. The problem is that the softdep
code stored inode numbers in the worklist structures, and does not
use vnodes. So VFS_VGET must be used to get a vnode during the final
flush stages, and this can call getnewvnode(), resulting in
a vfs_busy() + MNT_UNMOUNT hang.

I've tried to make the softdep code use vnodes, but that's a pain,
since it gets called at points were vnode ops are dangerous (i.e.
interrupt context, and uncertainty whether a vnode is locked, etc).

This is all icky stuff, but it does get things much closer to a
working state..
 1.1.2.3 19-Oct-1999  soren Tell us which fs is being bold.
 1.1.2.2 19-Oct-1999  soren Fix compile with FFS_EI.
 1.1.2.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.2.2.6 21-Apr-2001  he Pull up revision 1.12 (via patch, requested by chs):
Fix an ordering problem in softdep_setup_freeblks() that could
cause premature reuse of an inode, possibly causing the file to
contain garbage.
 1.2.2.5 14-Dec-2000  he Pull up revision 1.6 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.2.2.4 17-Aug-2000  fvdl pull up version 1.5:
Do not call MALLOC with M_WAITOK while holding the "lock". Thanks to
Ethan Solomita for the reminder.

Mark the parent vnode lock as recursive while flushing pagedeps. XXX.
Should fix kern/10564.
 1.2.2.3 29-Jun-2000  thorpej Pull up rev. 1.3:
We shouldn't be defining DEBUG and DIAGNOSTIC on our own; these may be
unwanted side-effects in the header files. For now, do the internal
#defines after including the headers.
 1.2.2.2 23-Jun-2000  fvdl Pull up moved version (from gnu/sys/ufs/ffs) as on the trunk.
 1.2.2.1 22-Jun-2000  fvdl file ffs_softdep.c was added on branch netbsd-1-5 on 2000-06-23 14:32:22 +0000
 1.7.2.7 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.7.2.6 05-Jan-2001  bouyer Sync with HEAD
 1.7.2.5 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.7.2.4 08-Dec-2000  bouyer Sync with HEAD.
 1.7.2.3 22-Nov-2000  bouyer Sync with HEAD.
 1.7.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.2.1 08-Nov-2000  bouyer file ffs_softdep.c was added on branch thorpej_scsipi on 2000-11-20 18:11:45 +0000
 1.13.6.7 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.13.6.6 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.13.6.5 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.13.6.4 16-Mar-2002  jdolecek Catch up with -current.
 1.13.6.3 11-Feb-2002  jdolecek Sync w/ -current.
 1.13.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.13.6.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.13.2.14 03-Jan-2003  thorpej Sync with HEAD.
 1.13.2.13 11-Dec-2002  thorpej Sync with HEAD.
 1.13.2.12 18-Oct-2002  nathanw Catch up to -current.
 1.13.2.11 27-Aug-2002  nathanw Catch up to -current.
 1.13.2.10 01-Aug-2002  nathanw Catch up to -current.
 1.13.2.9 15-Jul-2002  nathanw Whitespace.
 1.13.2.8 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.13.2.7 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.13.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.13.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.13.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.13.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.13.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.13.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.14.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.17.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.31.6.1 05-Jan-2003  tron Pull up revision 1.38 (via patch, requested by chs in ticket #1053):
several bugs:
- move calls to softdep_setup_pagecache() (which can sleep to allocate
memory) outside the softdep lock.
- replace the softdep_flush_indir() hack (which tries to find another
vnode to fsync when we are holding lots of buffer-cache buffers locked
for long periods of time) with softdep_trackbufs() (which just kicks
the syncer and sleeps under the same circumstances). the former method
had a lock-ordering problem which would occasionally deadlock.
- relax the assertion in softdep_sync_metadata() which says that we should
never see D_ALLOCDIRECT deps for VREG vnodes. it's ok to see those
attached to indirect blocks.
also, there's no need to splbio() while allocating the buffer headers
to which pagecache dependencies are attached, so remove that.
fixes all the problems in PR 19288.
 1.31.4.2 29-Aug-2002  gehenna catch up with -current.
 1.31.4.1 15-Jul-2002  gehenna catch up with -current.
 1.50.2.12 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.50.2.11 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.50.2.10 04-Feb-2005  skrll Sync with HEAD.
 1.50.2.9 18-Dec-2004  skrll Sync with HEAD.
 1.50.2.8 30-Oct-2004  skrll Reduce diff to HEAD
 1.50.2.7 27-Oct-2004  skrll Fix various comments that describe the argument structures
 1.50.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.50.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.50.2.4 03-Sep-2004  skrll Sync with HEAD
 1.50.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.50.2.2 03-Aug-2004  skrll Sync with HEAD
 1.50.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.61.4.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.61.4.1 12-Feb-2005  yamt sync with head.
 1.61.2.1 29-Apr-2005  kent sync with -current
 1.63.2.2 21-Oct-2005  tron Pull up following revision(s) (requested by yamt in ticket #845):
sys/ufs/ffs/ffs_softdep.c: revision 1.70 via patch
- for pagecache dependency, track which page in the block
has been written or not individually by (ab)using b_resid
in pcbp as a bitmap.
- add a comment to explain why it's needed.
PR/15364. reviewed by Chuck Silvers.
 1.63.2.1 07-May-2005  tron Pull up revision 1.64 (requested by hannken in ticket #259):
flush_inodedep_deps(): If softdep_lookupvp() returns NULL it means the
inode has been reclaimed. Skip the VOP_PUTPAGES() in this case.
Reviewed by: Chuck Silvers <chs@netbsd.org>
 1.66.2.9 27-Feb-2008  yamt sync with head.
 1.66.2.8 21-Jan-2008  yamt sync with head
 1.66.2.7 07-Dec-2007  yamt sync with head
 1.66.2.6 15-Nov-2007  yamt sync with head.
 1.66.2.5 27-Oct-2007  yamt sync with head.
 1.66.2.4 03-Sep-2007  yamt sync with head.
 1.66.2.3 26-Feb-2007  yamt sync with head.
 1.66.2.2 30-Dec-2006  yamt sync with head.
 1.66.2.1 21-Jun-2006  yamt sync with head.
 1.70.2.2 29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.70.2.1 20-Oct-2005  yamt adapt ufs.
 1.73.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.73.10.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.73.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.73.8.3 11-Aug-2006  yamt sync with head
 1.73.8.2 26-Jun-2006  yamt sync with head.
 1.73.8.1 24-May-2006  yamt sync with head.
 1.73.6.1 01-Jun-2006  kardel Sync with head.
 1.73.4.1 09-Sep-2006  rpaulo sync with head
 1.74.2.1 19-Jun-2006  chap Sync with head.
 1.76.6.2 10-Dec-2006  yamt sync with head.
 1.76.6.1 22-Oct-2006  yamt sync with head
 1.76.4.2 29-Dec-2006  ad Checkpoint work in progress.
 1.76.4.1 18-Nov-2006  ad Sync with head.
 1.81.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.81.2.1 05-Jun-2007  bouyer Pull up following revision(s) (requested by yamt in ticket #706):
sys/ufs/ffs/ffs_softdep.c: revision 1.90
flush_inodedep_deps: fix access after free. PR/29724.
 1.82.2.5 17-May-2007  yamt sync with head.
 1.82.2.4 15-Apr-2007  yamt sync with head.
 1.82.2.3 24-Mar-2007  yamt sync with head.
 1.82.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.82.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.86.2.29 24-Oct-2007  ad softdep_disk_write_complete: fix the test to return early if !softdep.
 1.86.2.28 19-Oct-2007  ad softdep_freefile: mark the inode modified so that it gets flushed in
ufs_reclaim, resolving any dependencies. Fixes "%s: unmount pending
error: blocks %d files %d".
 1.86.2.27 09-Oct-2007  ad Sync with head.
 1.86.2.26 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.86.2.25 10-Sep-2007  ad softdep_disk_write_complete: return early if the buffer describes a read,
meaning we don't have to grab bufcache_lock.
 1.86.2.24 30-Aug-2007  ad What a pain in the neck.. Just set bioops in softdep_initialize() and be
done with it.
 1.86.2.23 30-Aug-2007  ad Make softdep work.
 1.86.2.22 30-Aug-2007  ad Fix a NULL pointer deref and lock leak.
 1.86.2.21 30-Aug-2007  ad bufcache_lock is sufficient to inspect v_dirtyblkhd, vp->v_interlock is only
needed to modify.
 1.86.2.20 28-Aug-2007  yamt make this compile with DEBUG.
 1.86.2.19 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.86.2.18 21-Aug-2007  ad Remove dup call to callout_init().
 1.86.2.17 20-Aug-2007  ad softdep locking improvements. It hangs looping in flush_inodedep_deps(),
more work required.
 1.86.2.16 19-Aug-2007  ad - Back out the biodone() changes.
- Eliminate B_ERROR (from HEAD).
 1.86.2.15 15-Jul-2007  ad Sync with head.
 1.86.2.14 15-Jul-2007  ad Sync with head.
 1.86.2.13 07-Jul-2007  ad Fix some locking issues.
 1.86.2.12 01-Jul-2007  ad - Adapt to callout API change.
- Acquire softdep_lock before calling wakeup().
 1.86.2.11 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.86.2.10 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.86.2.9 08-Jun-2007  ad Sync with head.
 1.86.2.8 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.86.2.7 12-Apr-2007  ad Make it build with DEBUG.
 1.86.2.6 10-Apr-2007  ad Update to handle LWPs.
 1.86.2.5 10-Apr-2007  ad Sync with head.
 1.86.2.4 09-Apr-2007  ad - Add two new arguments to kthread_create1: pri_t pri, bool mpsafe.
- Fork kthreads off proc0 as new LWPs, not new processes.
 1.86.2.3 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.86.2.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.86.2.1 13-Mar-2007  ad Sync with head.
 1.87.2.1 11-Jul-2007  mjf Sync with head.
 1.95.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.95.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.96.8.2 29-Jul-2007  ad It's not a good idea for device drivers to modify b_flags, as they don't
need to understand the locking around that field. Instead of setting
B_ERROR, set b_error instead. b_error is 'owned' by whoever completes
the I/O request.
 1.96.8.1 29-Jul-2007  ad file ffs_softdep.c was added on branch matt-mips64 on 2007-07-29 13:31:14 +0000
 1.96.6.4 23-Mar-2008  matt sync with HEAD
 1.96.6.3 09-Jan-2008  matt sync with HEAD
 1.96.6.2 08-Nov-2007  matt sync with -HEAD
 1.96.6.1 06-Nov-2007  matt sync with HEAD
 1.96.4.5 09-Dec-2007  jmcneill Sync with HEAD.
 1.96.4.4 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.96.4.3 11-Nov-2007  joerg Sync with HEAD.
 1.96.4.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.96.4.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.97.2.1 14-Oct-2007  yamt sync with head.
 1.99.4.4 18-Feb-2008  mjf Sync with HEAD.
 1.99.4.3 27-Dec-2007  mjf Sync with HEAD.
 1.99.4.2 08-Dec-2007  mjf Sync with HEAD.
 1.99.4.1 19-Nov-2007  mjf Sync with HEAD.
 1.99.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.101.2.3 26-Dec-2007  ad Sync with head.
 1.101.2.2 19-Dec-2007  ad Get lfs mostly working.
 1.101.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.102.4.3 19-Jan-2008  bouyer Sync with HEAD
 1.102.4.2 08-Jan-2008  bouyer Sync with HEAD
 1.102.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.108.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.108.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.108.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.109.4.2 04-May-2009  yamt sync with head.
 1.109.4.1 16-May-2008  yamt sync with head.
 1.109.2.2 04-Jun-2008  yamt sync with head
 1.109.2.1 18-May-2008  yamt sync with head.
 1.111.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.115.6.2 03-Mar-2009  skrll Sync with HEAD.
 1.115.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.115.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.116.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.24 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.23 31-May-2008  ad branches: 1.23.6; 1.23.12;
XXX softdep:

If the number of deletes in progress is getting too high, newdirrem()
requests the syncer to flush faster, and in some cases will block to
prevent deletes accumulating faster than the disk can service them.

The syncer will try to lock vnodes that the remover holds locked, leading
to the syncer and remover proceeding in lockstep and making very little
overall forward progress.

Put a hook into ufs_rmdir() and ufs_remove() so that the softdep code
can pace itself without holding vnode locks if the number of deletes is
running out of control.
 1.22 02-Jan-2008  ad branches: 1.22.6; 1.22.8; 1.22.10; 1.22.12;
Merge vmlocking2 to head.
 1.21 04-Mar-2007  christos branches: 1.21.2; 1.21.16; 1.21.22; 1.21.24; 1.21.28;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.20 16-Nov-2006  christos branches: 1.20.4;
__unused removal on arguments; approved by core.
 1.19 13-Oct-2006  hannken Add __unused to unused function arguments.
 1.18 14-May-2006  elad branches: 1.18.8; 1.18.10;
integrate kauth.
 1.17 11-Dec-2005  christos branches: 1.17.4; 1.17.6; 1.17.8; 1.17.10; 1.17.12;
merge ktrace-lwp.
 1.16 02-Nov-2005  gdt Adjust signature of softdep_freefile (dummy stub which always panics
if called) to match ffs_extern.h so that kernels w/o softdep can compile.
 1.15 15-Jul-2005  thorpej Use ANSI function decls.
 1.14 26-Feb-2005  perry branches: 1.14.4;
nuke trailing whitespace
 1.13 10-Jan-2004  hannken branches: 1.13.8; 1.13.10;
Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue.

Cleanup ffs_sync() which did not synchronously wait when MNT_WAIT
was specified. Clear the work queue when MNT_WAIT is specified.

Result is a clean on-disk file system after ffs_sync(.., MNT_WAIT, ..)

From FreeBSD.
 1.12 29-Jun-2003  fvdl branches: 1.12.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.11 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.10 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.9 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.8 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.7 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.6 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.5 16-Sep-2001  jdolecek branches: 1.5.2;
add softdep_reinitialize() stub
 1.4 10-Jan-2001  ad branches: 1.4.2; 1.4.6; 1.4.8;
RCS ID
 1.3 14-Feb-2000  fvdl branches: 1.3.6;
Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.2 15-Nov-1999  fvdl branches: 1.2.2;
Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.1 19-Oct-1999  fvdl branches: 1.1.2;
file ffs_softdep.stub.c was initially added on branch fvdl-softdep.
 1.1.2.2 26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.1.2.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.2.2.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.3.6.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.3.6.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.3.6.1 14-Feb-2000  bouyer file ffs_softdep.stub.c was added on branch thorpej_scsipi on 2000-11-20 18:11:46 +0000
 1.4.8.1 01-Oct-2001  fvdl Catch up with -current.
 1.4.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.4.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.4.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.4.2.1 21-Sep-2001  nathanw Catch up to -current.
 1.5.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.12.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.12.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.12.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.12.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.12.2.2 03-Aug-2004  skrll Sync with HEAD
 1.12.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.13.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.13.8.1 29-Apr-2005  kent sync with -current
 1.14.4.4 21-Jan-2008  yamt sync with head
 1.14.4.3 03-Sep-2007  yamt sync with head.
 1.14.4.2 30-Dec-2006  yamt sync with head.
 1.14.4.1 21-Jun-2006  yamt sync with head.
 1.17.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.17.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.17.8.1 24-May-2006  yamt sync with head.
 1.17.6.1 01-Jun-2006  kardel Sync with head.
 1.17.4.1 09-Sep-2006  rpaulo sync with head
 1.18.10.2 10-Dec-2006  yamt sync with head.
 1.18.10.1 22-Oct-2006  yamt sync with head
 1.18.8.1 18-Nov-2006  ad Sync with head.
 1.20.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.21.28.1 02-Jan-2008  bouyer Sync with HEAD
 1.21.24.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.21.22.1 18-Feb-2008  mjf Sync with HEAD.
 1.21.16.1 09-Jan-2008  matt sync with HEAD
 1.21.2.2 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.21.2.1 20-Aug-2007  yamt fix builds without softdep.
 1.22.12.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.22.10.1 04-May-2009  yamt sync with head.
 1.22.8.1 04-Jun-2008  yamt sync with head
 1.22.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.23.12.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.23.6.1 03-Mar-2009  skrll Sync with HEAD.
 1.54 07-Jan-2023  chs ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:

commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000

This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.

To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.

Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000

One last pass to get all the unsigned comparisons correct.


In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.53 24-May-2022  andvar branches: 1.53.4;
fix various typos in comments, docs and log messages.
 1.52 21-Apr-2020  christos use %s/__func__ so that the strings can be shared.
 1.51 28-May-2019  kamil branches: 1.51.8;
Avoid unportable shift base -1 in ffs_subr.c

Cast the start variable before the modulo opration to unsigned int.

Detected with kUBSan.
 1.50 04-Jul-2018  kamil Avoid Undefined Behavior in ffs_clusteracct()

Change the type of 'bit' variable from int to unsigned int and use unsigned
values consistently.

sys/ufs/ffs/ffs_subr.c:336:10, shift exponent -1 is negative

Detected with Kernel Undefined Behavior Sanitizer.

Reported by <Harry Pantazis>
 1.49 07-May-2016  maxv branches: 1.49.16; 1.49.18;
uaf
 1.48 20-Oct-2013  htodd branches: 1.48.6;
Definining needswap where needed.
 1.47 14-Aug-2011  christos branches: 1.47.2; 1.47.12; 1.47.16;
fix sign-compare warnings
 1.46 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.45 03-Jun-2008  hannken branches: 1.45.20; 1.45.26; 1.45.28;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.44 29-Jan-2007  hubertf branches: 1.44.40; 1.44.42; 1.44.44; 1.44.46;
Remove more duplicate headers.
Patch by Slava Semushin <slava.semushin@gmail.com>

Again, this was tested by comparing obj files from a pristine and a patched
source tree against an i386/ALL kernel, and also for src/sbin/fsck_ffs,
src/sbin/fsdb and src/usr.sbin/makefs. Only changes in assert() line numbers
were detected in 'objdump -d' output.
 1.43 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.42 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.41 14-Jan-2006  yamt branches: 1.41.18; 1.41.20;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.40 27-Dec-2005  chs branches: 1.40.2;
changes for making DIAGNOSTIC not change the kernel ABI:
- for structure fields that are conditionally present,
make those fields always present.
- for functions which are conditionally inline, make them never inline.
- remove some other functions which are conditionally defined but
don't actually do anything anymore.
- make a lock-debugging function conditional on only LOCKDEBUG.

as discussed on tech-kern some time back.
 1.39 11-Dec-2005  christos merge ktrace-lwp.
 1.38 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.37 12-Sep-2005  drochner branches: 1.37.2;
move the new ffs_itimes() to a berr place -- ffs_subr.c is shared with
userland
 1.36 12-Sep-2005  christos Use nanotime() to update the time fields in filesystems. Convert the code
from macros to real functions. Original patch and review from chuq.
Note: ext2fs only keeps seconds in the on-disk inode, and msdosfs does not
have enough precision for all fields, so this is not very useful for those
two.
 1.35 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.34 15-Jul-2005  thorpej Use ANSI function decls.
 1.33 26-Feb-2005  perry branches: 1.33.4;
nuke trailing whitespace
 1.32 30-Dec-2003  pk branches: 1.32.8; 1.32.10;
Replace the traditional buffer memory management -- based on fixed per buffer
virtual memory reservation and a private pool of memory pages -- by a scheme
based on memory pools.

This allows better utilization of memory because buffers can now be allocated
with a granularity finer than the system's native page size (useful for
filesystems with e.g. 1k or 2k fragment sizes). It also avoids fragmentation
of virtual to physical memory mappings (due to the former fixed virtual
address reservation) resulting in better utilization of MMU resources on some
platforms. Finally, the scheme is more flexible by allowing run-time decisions
on the amount of memory to be used for buffers.

On the other hand, the effectiveness of the LRU queue for buffer recycling
may be somewhat reduced compared to the traditional method since, due to the
nature of the pool based memory allocation, the actual least recently used
buffer may release its memory to a pool different from the one needed by a
newly allocated buffer. However, this effect will kick in only if the
system is under memory pressure.
 1.31 02-Dec-2003  dbj clarify comments, especially since ffs_isfreeblock is non-intuitive:
ffs_isblock:
check if a block is available
returns true if all the correponding bits in the free map are 1
returns false if any corresponding bit in the free map is 0
ffs_isfreeblock:
check if a block is completely allocated
returns true if all the corresponding bits in the free map are 0
returns false if any corresponding bit in the free map is 1
 1.30 27-Oct-2003  lukem Overhaul how `build.sh tools' are used:

* Rename "config.h" to "nbtool_config.h" and
HAVE_CONFIG_H to HAVE_NBTOOL_CONFIG_H.
This makes in more obvious in the source when we're using
tools/compat/config.h versus "standard autoconf" config.h

* Consistently move the inclusion of nbtool_config.h to before
<sys/cdefs.h> so that the former can provide __RCSID() (et al),
and there's no need to protect those macros any more.

These changes should make it easier to "tool-ify" a program by adding:
#if HAVE_NBTOOL_CONFIG_H
#include "nbtool_config.h"
#endif
to the top of the source files (for the general case).
 1.29 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.28 02-Apr-2003  fvdl branches: 1.28.2;
Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.27 25-Jan-2003  tron Use PRId64 instead of hard coding "%lld" to fix build problems under
LP64 ports.
 1.26 25-Jan-2003  tron Fix printf() format strings problems caused by "daddr_t" change.
 1.25 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.24 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.23 06-Jul-2002  fredette Fixed a printf argument type.
 1.22 10-Apr-2002  mycroft branches: 1.22.2;
Use blkstofrags() and fragstoblks(). Use &(NBBY-1) rather than %NBBY.
Switch off of fs_fragshift rather than fs_frag (generates better jump tables).
 1.21 31-Jan-2002  tv These sources are pulled into makefs(8), so we need config.h and protection
for __KERNEL_RCSID().
 1.20 09-Jan-2002  lukem Only pull in <sys/systm.h> #ifdef _KERNEL, since it's a kernel only header.
In the ! _KERNEL case, provide own prototype for panic() instead.
 1.19 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.18 26-Oct-2001  lukem - pull in ufsmount.h after inode.h, because the latter pulls in
quota.h which the former needs, and this makes the usage consistent
with other files anyway
- expand the details in a few panic strings
 1.17 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.16 09-Aug-2001  lukem branches: 1.16.4;
be consistent and use "u_char" instead of "unsigned char"
 1.15 30-Mar-2000  augustss branches: 1.15.6; 1.15.10;
Remove register declarations.
 1.14 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.13 28-Jul-1998  drochner branches: 1.13.14; 1.13.16; 1.13.20;
The fragtbl[], inside[] and around[] variables are needed by "fsck",
so we can't put them inside "#ifdef _KERNEL".
Put declarations inside .c files where needed to preserve namespace.
 1.12 13-Jun-1998  kleink KNF, mostly of FFS_EI changes.
 1.11 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.10 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.9 12-Oct-1996  christos revert previous kprintf changes
 1.8 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.7 20-Sep-1996  christos Make this compile cleanly from userland (fsck_ffs).
 1.6 17-Mar-1996  christos Fix printf format strings
 1.5 09-Feb-1996  christos ffs prototypes
 1.4 28-Mar-1995  jtc KERNEL -> _KERNEL
 1.3 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.13.20.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.13.20.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.13.16.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.13.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.15.10.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.15.10.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.15.10.3 11-Feb-2002  jdolecek Sync w/ -current.
 1.15.10.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.15.10.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.15.6.7 11-Dec-2002  thorpej Sync with HEAD.
 1.15.6.6 01-Aug-2002  nathanw Catch up to -current.
 1.15.6.5 17-Apr-2002  nathanw Catch up to -current.
 1.15.6.4 28-Feb-2002  nathanw Catch up to -current.
 1.15.6.3 11-Jan-2002  nathanw More catchup.
 1.15.6.2 14-Nov-2001  nathanw Catch up to -current.
 1.15.6.1 24-Aug-2001  nathanw Catch up with -current.
 1.16.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.22.2.1 15-Jul-2002  gehenna catch up with -current.
 1.28.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.28.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.28.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.28.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.28.2.1 03-Aug-2004  skrll Sync with HEAD
 1.32.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.32.8.1 29-Apr-2005  kent sync with -current
 1.33.4.2 26-Feb-2007  yamt sync with head.
 1.33.4.1 21-Jun-2006  yamt sync with head.
 1.37.2.1 20-Oct-2005  yamt adapt ufs.
 1.40.2.1 15-Jan-2006  yamt sync with head.
 1.41.20.2 10-Dec-2006  yamt sync with head.
 1.41.20.1 22-Oct-2006  yamt sync with head
 1.41.18.2 01-Feb-2007  ad Sync with head.
 1.41.18.1 18-Nov-2006  ad Sync with head.
 1.44.46.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.44.44.1 04-May-2009  yamt sync with head.
 1.44.42.1 04-Jun-2008  yamt sync with head
 1.44.40.1 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.45.28.4 09-Feb-2011  bouyer Make it build without FFS_EI
 1.45.28.3 08-Feb-2011  bouyer for !_KERNEL case, always define FFS_EI.
Required for makefs, and maybe resize_ffs (it's not clear if
resize_ffs supports swapped byte order or not - swapped endian tests
are expected to fail but actually succeed :)
 1.45.28.2 08-Feb-2011  bouyer Sync with HEAD
 1.45.28.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.45.26.1 06-Jun-2011  jruoho Sync with HEAD.
 1.45.20.1 21-Apr-2011  rmind sync with head
 1.47.16.1 18-May-2014  rmind sync with head
 1.47.12.2 03-Dec-2017  jdolecek update from HEAD
 1.47.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.47.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.48.6.1 29-May-2016  skrll Sync with HEAD
 1.49.18.2 21-Apr-2020  martin Sync with HEAD
 1.49.18.1 10-Jun-2019  christos Sync with HEAD
 1.49.16.1 28-Jul-2018  pgoyette Sync with HEAD
 1.51.8.1 25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.53.4.1 13-May-2023  martin Pull up following revision(s) (requested by chs in ticket #160):

usr.sbin/makefs/ffs/ffs_alloc.c: revision 1.31
sbin/tunefs/tunefs.c: revision 1.58
sbin/fsck_ffs/setup.c: revision 1.105
sbin/fsck_ffs/pass5.c: revision 1.56
usr.sbin/makefs/ffs.c: revision 1.74
usr.sbin/makefs/ffs/mkfs.c: revision 1.42
usr.sbin/makefs/Makefile: revision 1.40
sys/ufs/ffs/fs.h: revision 1.71
sbin/fsdb/fsdb.c: revision 1.54
sbin/resize_ffs/resize_ffs.c: revision 1.58
sbin/fsck_ffs/pass4.c: revision 1.29
usr.sbin/makefs/ffs/ffs_extern.h: revision 1.9
sbin/newfs/mkfs.c: revision 1.133
sys/ufs/ffs/ffs_alloc.c: revision 1.172
sbin/fsck_ffs/pass1b.c: revision 1.24
usr.sbin/dumpfs/dumpfs.c: revision 1.68
sys/ufs/ffs/ffs_extern.h: revision 1.88
usr.sbin/quotacheck/quotacheck.c: revision 1.51
sys/ufs/ffs/ffs_subr.c: revision 1.54
sbin/fsck_ffs/main.c: revision 1.91
sbin/fsck_ffs/pass1.c: revision 1.63

ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:
commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000
This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.
To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.
Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000
One last pass to get all the unsigned comparisons correct.

In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.9 11-Dec-2005  christos merge ktrace-lwp.
 1.8 26-Feb-2005  perry nuke trailing whitespace
 1.7 27-Oct-2003  lukem branches: 1.7.8; 1.7.10;
Overhaul how `build.sh tools' are used:

* Rename "config.h" to "nbtool_config.h" and
HAVE_CONFIG_H to HAVE_NBTOOL_CONFIG_H.
This makes in more obvious in the source when we're using
tools/compat/config.h versus "standard autoconf" config.h

* Consistently move the inclusion of nbtool_config.h to before
<sys/cdefs.h> so that the former can provide __RCSID() (et al),
and there's no need to protect those macros any more.

These changes should make it easier to "tool-ify" a program by adding:
#if HAVE_NBTOOL_CONFIG_H
#include "nbtool_config.h"
#endif
to the top of the source files (for the general case).
 1.6 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.5 31-Jan-2002  tv branches: 1.5.16;
These sources are pulled into makefs(8), so we need config.h and protection
for __KERNEL_RCSID().
 1.4 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.3 18-Jan-2001  jdolecek branches: 1.3.2; 1.3.6; 1.3.10;
constify
 1.2 29-Jun-1994  cgd branches: 1.2.34;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.2.34.1 11-Feb-2001  bouyer Sync with HEAD.
 1.3.10.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.3.6.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.3.6.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.3.2.2 28-Feb-2002  nathanw Catch up to -current.
 1.3.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.5.16.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.5.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.5.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.5.16.1 03-Aug-2004  skrll Sync with HEAD
 1.7.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.7.8.1 29-Apr-2005  kent sync with -current
 1.384 30-Dec-2024  hannken Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.383 30-Dec-2024  hannken emove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.382 08-Sep-2023  riastradh branches: 1.382.6;
ffs_sync: Avoid unlocked access to v_numoutput/v_dirtyblkhd.

Found by lockdoc.

PR kern/57606
 1.381 15-Jun-2023  hannken Undo unlock/relock for VOP_IOCTL().

PR kern/57450 (unplugging hung USB disk triggers panic via _vstate_assert)
 1.380 05-Jun-2023  rin Make DEBUG_FFS_MOUNT compile again (with 64-bit ino_t).
 1.379 21-Dec-2022  chs ffs: fail mounts requesting ACLs for non-ea UFS2 file systems

For non-ea UFS2 file system, fail mounts that request ACLs rather than
letting the mount succeed only to reject all ACL operations later.

Also fix the messages about the on-disk fs flags conflicting with
the mount options for which type of ACLs to use, and about requesting
both types of ACLs.
 1.378 17-Nov-2022  chs branches: 1.378.2;
Restore backward compatibility of UFS2 with previous NetBSD releases by
disabling support in UFS2 for extended attributes (including ACLs).
Add a new variant of UFS2 called "UFS2ea" that does support extended attributes.
Add new fsck_ffs operations "-c ea" and "-c no-ea" to convert file systems
from UFS2 to UFS2ea and vice-versa (both of which delete all existing extended
attributes in the process).
 1.377 10-Nov-2022  hannken Some changes to "fs->fs_fmod" and "fs->fs_clean":
- clear "fs->fs_fmod" after reading the super block.
- assert we don't write a super block when mounted read-only.
- make sure "fs->fs_clean" is one of FS_ISCLEAN or FS_WASCLEAN.
- print "file system not clean" on every mount.

Should fix PR kern/57010: ffs: mounting unclean non-root fs read-only
causes spurious write to superblock
 1.376 16-Apr-2022  hannken Unlock vnode for VOP_IOCTL() and wapbl_flush().
 1.375 19-Mar-2022  hannken Remove now unused VV_LOCKSWORK, all file systems support locking.

Remove unused predicates vn_locked() and vn_anylocked().

Welcome to 9.99.95
 1.374 12-Mar-2022  riastradh ffs: Fix 64-bit inode integer truncation.

Reported-by: syzbot+1ae93e092d532582b809@syzkaller.appspotmail.com
 1.373 18-Sep-2021  christos Change the default for ACLs to be posix1e instead of nfsv4 to match FreeBSD.
Requested by chuq.
 1.372 20-Aug-2020  christos Don't cache id's for vnodes that have ACLs. ok chs@
 1.371 05-Jul-2020  christos simplify the acl setup, and fix reversed mask in the fs_flags code.
 1.370 18-May-2020  hannken Assert ufs_strategy() always gets used while current thread
holds a fstrans lock.
 1.369 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.368 12-May-2020  ad cache_enter_id(): give it a boolean parameter to indicate whether the cached
identity is valid.
 1.367 04-Apr-2020  ad Merge the remaining changes from the ad-namecache branch, affecting namei()
and getcwd():

- push vnode locking back as far as possible.
- do most lookups directly in the namecache, avoiding vnode locks & refs.
- don't block new refs to vnodes across VOP_INACTIVE().
- get shared locks for VOP_LOOKUP() if the file system supports it.
- correct lock types for VOP_ACCESS() / VOP_GETATTR() in a few places.

Possible future enhancements:

- make the lookups lockless.
- support dotdot lookups by being lockless and inferring absence of chroot.
- maybe make it work for layered file systems.
- avoid vnode references at the root & cwd.
 1.366 16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.365 27-Feb-2020  ad Tighten up the locking around vp->v_iflag a little more after the recent
split of vmobjlock & v_interlock.
 1.364 23-Feb-2020  ad UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.363 17-Jan-2020  ad VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.362 20-Jun-2019  pgoyette branches: 1.362.2; 1.362.4;
Split the ufs code out of the ffs module and into its own module.

Adapt chfs and ext2fs modules accordingly.
 1.361 01-Jan-2019  hannken Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.360 10-Dec-2018  jdolecek make UFS_WAPBL_JLOCK_ASSERT() #ifdef DIAGNOSTIC, same as the underlying
function KASSERT(), so that it actually does something; fix code using
it to actually pass correct params, so that it compiles

remove UFS_WAPBL_JUNLOCK_ASSERT(), as that is inherently racy (it's
okay on those places if the rwlock is held by other lwp); depend
on the RW_ASSERT()/LOCKDEBUG inside rw_enter() to catch the case
with wapbl rwlock held by current lwp
 1.359 10-Dec-2018  maxv Remove unused mbuf.h includes.
 1.358 18-Jul-2018  uwe ffs_superblock_validate - check fs_old_size too.

Now I can mount OpenWindows Version 3 CD from 1991.
 1.357 28-May-2018  chs branches: 1.357.2;
add a genfs method to allow a file system to limit the range of pages
that are given to a single GOP_WRITE() call. needed by ZFS.
 1.356 28-Jan-2018  hannken branches: 1.356.2;
Prevent use-after-free where genfs_node_destroy() would destroy
a lock residing in the just freed inode data.
 1.355 15-Nov-2017  christos PR/52728: Izumi Tsutsui: "mount -u /dev/ /" triggers kernel panic
Simplify the control flow of the mount code and make sure that the
mountfrom argument can be converted to a block device in the update
case.
XXX: pullup-8
 1.354 20-Aug-2017  maya print mode as octal for readability
 1.353 17-Apr-2017  hannken branches: 1.353.2; 1.353.4;
Remove unused argument "nextp" from vfs_busy() and vfs_unbusy().
Remove argument "keepref" from vfs_unbusy() and add vfs_ref() where needed.
 1.352 17-Apr-2017  hannken Add vfs_ref(mp) and vfs_rele(mp) to add or remove a reference to
struct mount. Rename vfs_destroy(mp) to vfs_rele(mp) and replace
incrementing mp->mnt_refcnt with vfs_ref(mp).
 1.351 01-Apr-2017  riastradh KASSERT(mutex_owned(vp->v_interlock)) in vnode iterator selector.
 1.350 10-Mar-2017  jdolecek slightly rearrange the code for IMNT_WANTRDONLY + MNT_UPDATE case for
better readability, no functional change
 1.349 06-Mar-2017  hannken Adapt the test "enable WAPBL on rw mounts only" to the recent change of
the protocol to update a mounted file.

Should fix PR kern/52031 (FFS mount update doesn't play nice with WAPBL)
 1.348 01-Mar-2017  hannken Bring back read-write to read-only mount update for ffs.
 1.347 01-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().
 1.346 22-Feb-2017  hannken Enable fstrans on all file systems.

Welcome to 7.99.61
 1.345 17-Feb-2017  hannken Add generic genfs_suspendctl() and use it for all file systems.
Layered file systems need work.
 1.344 17-Feb-2017  hannken Untangle VFS_SYNC() from VFS_SUSPENDCTL().
 1.343 17-Feb-2017  hannken Flush the log to disk when ffs_sync() gets called with MNT_WAIT.
 1.342 27-Dec-2016  hannken branches: 1.342.2;
Fix a bug introduced with Rev. 1.294: use LK_NOWAIT when called with MNT_LAZY.
 1.341 20-Oct-2016  jdolecek add assertion to ensure ffs_cgupdate() is always called from
within a WAPBL transaction (if logging is on)
 1.340 28-Jul-2016  martin From Michael Plass:

The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.

Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.339 19-Jun-2016  christos branches: 1.339.2;
Relax the dup alloc tests to not include the on-disk data for ffsv2, since
nothing checks that the lazy-initialized inodes are correct and if they happen
to get corrupted, there is no way to fix them.
 1.338 23-Dec-2015  christos We need to check if the inode is initialized for ffsv2 when we translate a
filehandle to a vnode. This can come from nfs and it could be out of range.
In that case we read garbage from the disk, end up trying to free bogus data
when we put the vnode back and we crash.
XXX: pullup-7
 1.337 15-Nov-2015  pgoyette If file system ffs is built with WAPBL defined, make sure that the
module depends on the wapbl module.

No impact to users of built-in ffs file system code, as the WAPBL
#define will cause inclusion of the code in the kernel.

A standard build of the modular ffs file system code will #define
WAPBL, so the module will only work on a kernel which was also
built with WAPBL defined (or, once I commit it, with a dynamically-
loaded wapbl module).
 1.336 22-Oct-2015  maxv Fix PR 50070. From hannken@.
 1.335 24-Jul-2015  maxv Unused inits (harmless).

Found by Brainy.
 1.334 23-May-2015  maxv Add a missing goto.

(was here before my changes)

ok christos@
 1.333 19-May-2015  martin Cosmetics: fix netbsd.org spelling
 1.332 18-May-2015  martin Print all sizes as size_t
 1.331 18-May-2015  martin Make the recently added fs_cgsize test less strict, as it prevents existing
installs from booting.
Catch the common case and warn about it, pointing to a web page describing
the issue - but allow mounting. In all other cases, print more details about
the inconsistency and fail the mount.
 1.330 26-Apr-2015  maxv ffs_superblock_validate(): check the size of cylinder groups.
 1.329 22-Apr-2015  maxv Instead of duplicating code, create ffs_is_appleufs(): returns 1 if the
device is an AppleUFS FS, 0 otherwise.

This changes the behavior a bit: if the kernel cannot determine whether the
disk is an AppleUFS one or not, it now considers it as a normal UFS rather
than returning an error and not mounting/reloading it.

No particular comment on tech-kern@
 1.328 04-Apr-2015  maxv ffs_superblock_validate(): ensure fs_ncg!=0 and fs_maxbpg!=0 to prevent
several divisions by zero.
 1.327 28-Mar-2015  maxv Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.326 27-Mar-2015  riastradh Disentangle buffer-cached I/O from page-cached I/O in UFS.

Page-cached I/O is used for regular files, and is initiated by VFS
users such as userland and NFS.

Buffer-cached I/O is used for directories and symlinks, and is issued
only internally by UFS.

New UFS routine ufs_bufio replaces vn_rdwr for internal use.
ufs_bufio is implemented by new UFS operations uo_bufrd/uo_bufwr,
which sit in ufs_readwrite.c alongside the VOP_READ/VOP_WRITE
implementations.

I preserved the code as much as possible and will leave further
simplification for future commits. I kept the ulfs_readwrite.c
copypasta close to ufs_readwrite.c in case we ever want to merge them
back; likewise ext2fs_readwrite.c.

No externally visible semantic change. All atf fs tests still pass.
 1.325 17-Mar-2015  hannken Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.324 15-Mar-2015  maxv ffs_reload(): fix a bug that prevents Big Endian FSes from being reloaded.
'newfs' should be tagged as FS_SWAPPED, not 'fs'.

Was here before my changes.

While here, also KNF a bit.
 1.323 14-Mar-2015  maxv ffs_superblock_validate(): ensure fs_ipg and fs_fpg are != 0. Otherwise
division by zero in several places.
 1.322 10-Mar-2015  maxv ffs_superblock_validate(): check the number of inodes per block. Otherwise
a malformed value could panic the system.
 1.321 03-Mar-2015  maxv ffs_reload(): release 'bp' earlier
 1.320 03-Mar-2015  maxv ffs_reload(): the current implementation blindly guesses critical fields
of the superblock didn't change. Add checks to ensure they didn't change
for real. This prevents several memory corruptions.
 1.319 23-Feb-2015  maxv Small changes:
- instead of always calling DPRINTF with __func__, put __func__ directly
in the macro
- ffs_mountfs(): rename fsblockloc -> fs_sblockloc, initialize fs_sbsize
to zero
No real functional change
 1.318 22-Feb-2015  maxv ffs_superblock_validate(): sanitize fs_fragshift, fs_bmask and fs_fmask.
 1.317 20-Feb-2015  maxv Style, and fix a DPRINTF

No functional change
 1.316 14-Feb-2015  maxv ffs_superblock_validate(): when checking the number of frag blocks, also
make sure it matches fs->fs_frag. This also prevents an infinite loop if
fs->fs_frag=0.
 1.315 14-Feb-2015  maxv ffs_superblock_validate(): compute fs_bshift and fs_fshift, and ensure
they are consistent with what is indicated in the superblock. This allows
us to safely use some ffs_ macros.
 1.314 14-Feb-2015  maxv In fact, we need to sanitize the superblock *after* swapping it. Therefore,
move the swap code inside the loop.

'fs->fs_sbsize' is swapped twice: the first time in order to get the
correct superblock size, and later when swapping the whole superblock
structure. As a result, we need to check 'fs->fs_sbsize' twice.

This:
- fixes my previous changes for swapped FSes
- allows the kernel to look for other superblock locations if the
current superblock is not validated

And now:
- ffs_superblock_validate() takes only one argument: the fs structure
- 'fs_bsize' is unused, so delete it

Add some comments to explain a bit what we are doing.
 1.313 14-Feb-2015  maxv ffs_superblock_validate(): sanitize the number of frag blocks.
 1.312 14-Feb-2015  maxv Currently, in ffs_reload(), we don't handle the possibility that the
superblock location may have changed. But that implies that we don't
handle the possibility that its size may have changed either.

Therefore: add a check to ensure the size hasn't changed. Otherwise the
mismatch leads to a memory corruption with kmem.
 1.311 14-Feb-2015  maxv Style. No functional change.
 1.310 14-Feb-2015  maxv ffs_reload(): call ffs_superblock_validate() with the new superblock.
 1.309 13-Feb-2015  maxv ffs_superblock_validate(): ensure fs->fs_cssize!=0, otherwise the kernel
panics with kmem_alloc(0).
 1.308 13-Feb-2015  maxv Add some checks in ffs_superblock_validate():
- fs_bsize < MINBSIZE
- !powerof2(fs_bsize)
- !powerof2(fs->fs_fsize)
- fs_bsize < fs->fs_fsize

Based on makefs/ffs.
 1.307 13-Feb-2015  maxv Add a new function: ffs_superblock_validate(). And add a new check to
ensure fs_size!=0; otherwise the kernel panics with a division by zero.
 1.306 13-Feb-2015  maxv Make this a bit more readable. No functional change.
 1.305 16-Jan-2015  christos PR/39371: Tobias Nygren: Don't fail mounting root if WAPBL log is corrupt.
Patch from Sergio L. Pascual.
XXX: pullup-7
 1.304 14-Dec-2014  christos Restore apple ufs error handling.
 1.303 14-Dec-2014  christos - Add debugging for mount...
- Merge some error returns
- Check more errors
 1.302 14-Nov-2014  manu branches: 1.302.2;
Fix use-after-free on failed unmount with extended attribute enabled

When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.

The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart

As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.301 30-Oct-2014  maxv Limit the superblock size to SBLOCKSIZE, not MAXBSIZE. Otherwise memcpy
will read beyond the allocated buffer.

Discussed a bit on tech-kern@.
 1.300 24-Oct-2014  njoly One semicolon is enough.
 1.299 24-May-2014  christos branches: 1.299.2;
Introduce a selector function to the vfs vnode iterator so that we don't
need to vget() vnodes that we are not interested at, and optimize locking
a bit. Iterator changes reviewed by Hannken (thanks), the rest of the bugs
are mine.
 1.298 08-May-2014  hannken Add a global vnode cache:

- vcache_get() retrieves a referenced and initialised vnode / fs node pair.
- vcache_remove() removes a vnode / fs node pair from the cache.

On cache miss vcache_get() calls new vfs operation vfs_loadvnode() to
initialise a vnode / fs node pair. This call is guaranteed exclusive,
no other thread will try to load this vnode / fs node pair.

Convert ufs/ext2fs, ufs/ffs and ufs/mfs to use this interface.

Remove now unused ufs/ufs_ihash

Discussed on tech-kern.

Welcome to 6.99.41
 1.297 16-Apr-2014  maxv An (un)privileged user can easily make the kernel dereference a NULL
pointer.

The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).

ok christos@
 1.296 01-Apr-2014  christos branches: 1.296.2;
Check for bread errors before we do the size check. Otherwise we de-reference
NULL...
 1.295 23-Mar-2014  hannken Change all vfsops to use C99 designated initializers.

No functional changes intended.
 1.294 17-Mar-2014  hannken Change ffs_sync() to use vfs_vnode_iterator.
 1.293 05-Mar-2014  hannken Current support for iterating over mnt_vnodelist is rudimentary. Every
caller has to care about list and vnode mutexes, reference count being zero,
intermediate vnode states like VI_CLEAN, VI_XLOCK, VI_MARKER and so on.

Add an interface to iterate over a vnode list:

void vfs_vnode_iterator_init(struct mount *mp, struct vnode_iterator **marker)
void vfs_vnode_iterator_destroy(struct vnode_iterator *marker)
bool vfs_vnode_iterator_next(struct vnode_iterator *marker, struct vnode **vpp)

vfs_vnode_iterator_next() returns either "false / *vpp == NULL" when done
or "true / *vpp != NULL" to return the next referenced vnode from the list.

To make vrecycle() work in this environment change it to

bool vrecycle(struct vnode *vp)

where "vp" is a referenced vnode to be destroyed if this is the last reference.

Discussed on tech-kern.

Welcome to 6.99.34
 1.292 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.291 23-Nov-2013  christos change the mountlist CIRCLEQ into a TAILQ
 1.290 29-Oct-2013  hannken Vnode API cleanup pass 1.

- Make these defines and functions private to vfs_vnode.c:

VC_MASK, VC_LOCK, DOCLOSE, VI_IANCTREDO and VI_INACTNOW
vclean() and vrelel()

- Remove the long time unused lwp argument from vrecycle().

- Remove vtryget(), it is responsible for ugly hacks and doesn't
look that effective.

Presented on tech-kern.

Welcome to 6.99.25
 1.289 30-Sep-2013  hannken Replace macro v_specmountpoint with two functions spec_node_getmountedfs()
and spec_node_setmountedfs() to manage the file system mounted on a device.
Assert the device is a block device.

Welcome to 6.99.24

Discussed on tech-kern@ some time ago.

Reviewed by: David Holland <dholland@netbsd.org>
 1.288 16-Sep-2013  hannken Function ffs_reload() works on a read-only mount, so remove the call
to ffs_snapshot_mount() as it would panic later with "already on list"
when remounting read-write.

Should fix PR kern/48211 (Unclean shutdown with active snapshot causes
panic during reboot)
 1.287 11-Aug-2013  dholland Kill off uo_unmark_vnode/UFS_UNMARK_VNODE as it's now a leftover.
 1.286 23-Jun-2013  dholland branches: 1.286.2;
Stick ffs_ in front of the following macros:
fragstoblks()
blkstofrags()
fragnum()
blknum()

to finish the job of distinguishing them from the lfs versions, which
Christos renamed the other day.

I believe this is the last of the overtly ambiguous exported symbols
from ffs... or at least, the last of the ones that conflicted with lfs.
ffs still pollutes the C namespace very broadly (as does ufs) and this
needs quite a bit more cleanup.

XXX: boo on macros with lowercase names. But I'm not tackling that just yet.
 1.285 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.284 16-Jun-2013  hannken Add an UFS_SNAPGONE() ufs op replacing the calls
to ffs_snapgone() in ufs_lookup.c.

Ok: David Holland <dholland@netbsd.org>

Welcome to 6.99.22
 1.283 09-Jun-2013  dholland Stick UFS_ in front of these symbols:
DIRBLKSIZ
DIRECTSIZ
DIRSIZ
OLDDIRFMT
NEWDIRFMT

Part of PR 47909.
 1.282 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.281 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.280 26-Nov-2012  drochner allow to enable ffs "discard" by update mounts, make the flag visible
to userland
 1.279 19-Oct-2012  drochner Implement experimental support to pass notifications that a file
was deleted from the filesystem to the disk driver, commonly
known as "discard" or "trim".
fs/driver support is in ffs and ata wd for now.
This is what was posted here:
http://mail-index.netbsd.org/tech-kern/2012/02/28/msg012813.html
with minor cleanup, and the global switch replaced by a mount option.
 1.278 10-Sep-2012  manu branches: 1.278.2;
Stop extended attributes at the appropriate place so that unmount
does not fail with EBUSY on filesystem with extended attributes ensabled.
 1.277 29-Apr-2012  chs change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
 1.276 13-Mar-2012  elad Replace the remaining KAUTH_GENERIC_ISSUSER authorization calls with
something meaningful. All relevant documentation has been updated or
written.

Most of these changes were brought up in the following messages:

http://mail-index.netbsd.org/tech-kern/2012/01/18/msg012490.html
http://mail-index.netbsd.org/tech-kern/2012/01/19/msg012502.html
http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012728.html

Thanks to christos, manu, njoly, and jmmv for input.

Huge thanks to pgoyette for spinning these changes through some build
cycles and ATF.
 1.275 29-Jan-2012  nonaka branches: 1.275.2;
use FS_UFS[12]_MAGIC_SWAPPED instead of bswap32(FS_UFS[12]_MAGIC).
 1.274 28-Jan-2012  rmind pool_page_alloc, pool_page_alloc_meta: avoid extra compare, use const.
ffs_mountfs,sys_swapctl: replace memset with kmem_zalloc.
sys_swapctl: move kmem_free outside the lock path.
uvm_init: fix comment, remove pointless numeration of steps.
uvm_map_enter: remove meflagval variable.
Fix some indentation.
 1.273 27-Jan-2012  para converting readdir in ffs ext2fs from malloc(9) to kmem(9)
while there allocate ufs mount structs from kmem(9) too
preceding kmem-vmem-pool-patch

releng@ acknowledged
 1.272 03-Jan-2012  pgoyette Display current mount point, rather than previous one, when printing
the "replaying log to disk" message.

OK dholland@

Fixes PR kern/39609
 1.271 14-Nov-2011  hannken branches: 1.271.4;
VOP_OPEN() needs a locked vnode. All these copy-and-pasted xxxfs_mount()
implementations need more review.
 1.270 13-Nov-2011  christos use getdiskinfo()
 1.269 07-Oct-2011  hannken branches: 1.269.2;
As vnalloc() always allocates with PR_WAITOK there is no longer the need
to test its result for NULL.
 1.268 17-Jun-2011  manu Add mount -o extattr option to enable extended attributs (corrently only
for UFS1).
Remove kernel option for EA backing store autocreation and do it by
default. Add a sysctl so that autocreated attriutr size can be modified.
 1.267 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.266 27-Apr-2011  hannken branches: 1.266.2;
Cleanup ffs fsync and make devices on wapbl enabled file systems work here:

- Replace the ugly sync loop in ffs_full_fsync() and ffs_vfs_fsync() with
vflushbuf(). This loop is a relic of softdeps and not needed anymore.

- Add ffs_spec_fsync() for device nodes on ffs file systems that calls
spec_fsync() like all other file systems do and then updates the ctime.

Discussed on tech-kern.

Should fix PRs:
PR #41192 wapbl diagnostic panic during cgdconfig
PR #41977 kernel diagnostic assertion "rw_lock_held(&wl->wl_rwlock)" failed
PR #42149 wapbl locking panic if watching DVD
PR #42551 Lockdebug assert in wapbl when running zpool
 1.265 27-Mar-2011  mlelstv Don't abort when APPLE_UFS autodetection cannot read the apple ufs label
due to sector size or alignment problems. Autodetection is only a safety
measure, you should mark the filesystem type in the BSD disklabel.
 1.264 06-Mar-2011  bouyer merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.263 27-Dec-2010  hannken branches: 1.263.2; 1.263.4;
Extend the range of fstrans transactions to a sequence of vnode operations
on a locked vnode. This leaves a suspended file system and therefore a
snapshot with either all or no operations of such a sequence done.
 1.262 09-Aug-2010  pooka add a linefeed to the previous
 1.261 09-Aug-2010  pooka Return error if we try to mount a file system with block size > MAXBSIZE.

Note: there is a billion ways to make the kernel panic by trying
to mount a garbage file system and I don't imagine we'll ever get
close to fixing even half of them. However, for this one failing
gracefully is a bonus since Xen DomU only does 32k MAXBSIZE and
the 64k MAXBSIZE file systems are out there (PR port-xen/43727).

Tested by compiling sys/rump with CPPFLAGS+=-DMAXPHYS=32768 (all
tests in tests/fs still pass). I don't know how we're going to
translate this into an easy regression test, though. Maybe with
a hacked newfs?
 1.260 21-Jul-2010  hannken Make holding v_interlock mandatory for callers of vget().

Announced some time ago on tech-kern.
 1.259 24-Jun-2010  hannken Clean up vnode lock operations pass 2:

VOP_UNLOCK(vp, flags) -> VOP_UNLOCK(vp): Remove the unneeded flags argument.

Welcome to 5.99.32.

Discussed on tech-kern.
 1.258 11-Feb-2010  mlelstv branches: 1.258.2;
There is no code left that uses disk size data, so don't query it.
This also failed when querying the simulated block device from mfs.
Fixes PR kern/42782.
 1.257 05-Feb-2010  mlelstv branches: 1.257.2;
Correct addressing of superblock updates.
 1.256 31-Jan-2010  mlelstv Fix block shift to work with different device block sizes.

Unlike other filesystems this has some side issues because
the shift values are stored in the superblock and because
userland utitlies share the same fsbtodb macros.

-> the kernel now ignores the value stored in the superblock.
-> the macro adaption is only done for defined(_KERNEL) code.
 1.255 31-Jan-2010  mlelstv Replace individual queries for partition information with
new helper function.
 1.254 08-Jan-2010  pooka The VATTR_NULL/VREF/VHOLD/HOLDRELE() macros lost their will to live
years ago when the kernel was modified to not alter ABI based on
DIAGNOSTIC, and now just call the respective function interfaces
(in lowercase). Plenty of mix'n match upper/lowercase has creeped
into the tree since then. Nuke the macros and convert all callsites
to lowercase.

no functional change
 1.253 04-Nov-2009  hannken Now that softdep has left the tree the only place needing the ffs_lock()
hack is ffs_sync().

- Use the generic lock operations for ffs.
- Change ffs_sync() to omit the vnode lock while suspending.

Reviewed by: Antti Kantee <pooka@netbsd.org>
 1.252 13-Sep-2009  bouyer If the WAPBL journal can't be read (ffs_wapbl_replay_start() fails),
mount the filesystem anyway if MNT_FORCE is present.
This allows to still boot single-user a system with a corrupted
WAPBL on /, and so get a chance to run fsck to fix it.
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
 1.251 13-Sep-2009  tsutsui Move declaration of ufs_hashlock into <ufs/ufs_extern.h> from each c source.
 1.250 31-Jul-2009  pooka Don't free extattr resources until it is certain that unmount
succeeds. Also, "unmount system call" -> "unmount vfs operation"
in comment just so that our comments aren't 15+ years outdated.
 1.249 23-Jul-2009  pooka Restore error behaviour bulldozed in rev 1.246.

might fix PR kern/41769
 1.248 06-Jul-2009  christos Fix bug introduced in revision 1.174 where a NULL fspec with an MNT_UPDATE
command would always return EINVAL. This broke fsck on root, where fsck'ing
a dirty root would always return an error causing rc to resort in a reboot.
 1.247 29-Jun-2009  dholland Convert 67 namei call sites to use namei_simple, in these functions:

check_console, veriexecclose, veriexec_delete, veriexec_file_add,
emul_find_root, coff_load_shlib (sh3 version), coff_load_shlib,
compat_20_sys_statfs, compat_20_netbsd32_statfs,
ELFNAME2(netbsd32,probe_noteless), darwin_sys_statfs,
ibcs2_sys_statfs, ibcs2_sys_statvfs, linux_sys_uselib,
osf1_sys_statfs, sunos_sys_statfs, sunos32_sys_statfs,
ultrix_sys_statfs, do_sys_mount, fss_create_files (3 of 4),
adosfs_mount, cd9660_mount, coda_ioctl, coda_mount, ext2fs_mount,
ffs_mount, filecore_mount, hfs_mount, lfs_mount, msdosfs_mount,
ntfs_mount, sysvbfs_mount, udf_mount, union_mount, sys_chflags,
sys_lchflags, sys_chmod, sys_lchmod, sys_chown, sys_lchown,
sys___posix_chown, sys___posix_lchown, sys_link, do_sys_pstatvfs,
sys_quotactl, sys_revoke, sys_truncate, do_sys_utimes, sys_extattrctl,
sys_extattr_set_file, sys_extattr_set_link, sys_extattr_get_file,
sys_extattr_get_link, sys_extattr_delete_file,
sys_extattr_delete_link, sys_extattr_list_file, sys_extattr_list_link,
sys_setxattr, sys_lsetxattr, sys_getxattr, sys_lgetxattr,
sys_listxattr, sys_llistxattr, sys_removexattr, sys_lremovexattr

All have been scrutinized (several times, in fact) and compile-tested,
but not all have been explicitly tested in action.

XXX: While I haven't (intentionally) changed the use or nonuse of
XXX: TRYEMULROOT in any of these places, I'm not convinced all the
XXX: uses are correct; an audit might be desirable.
 1.246 25-Apr-2009  elad Add genfs_can_mount() and use it to prevent some more code duplication of
the security checks when mounting a device (VOP_ACCESS() + kauth(9) call)).

Proposed with no objections on tech-kern@:

http://mail-index.netbsd.org/tech-kern/2009/04/20/msg004859.html

The vnode is always expected to be locked, so no locking is done outside
the file-system code.
 1.245 29-Mar-2009  ad fsync:

- atime updates were not being synced.

ffs_sync:

- In some cases the sync vnode was acting like now dead /usr/sbin/update.
It was examining vnodes that it should have ignored.

- It would find dirty inodes and try to flush them. Often ffs_fsync()
cheerfully ignored the flush request due to the fsync bug. Such inodes
remained dirty and were repeatedly re-examined by the syncer until
vnode reclaim or system shutdown.

- We were marking our place in the per-mount vnode list even though in
most cases there was not flush to perform. While not a bug, this wasted
CPU cycles because a TAILQ_NEXT would have sufficed.
 1.244 21-Mar-2009  ad ffs_sync: ensure that we *do* flush atime updates periodically.
ffs_update() was eating the flag.
 1.243 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.242 22-Feb-2009  ad PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc

- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.

- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.

- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.

- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.

- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.

- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.241 13-Nov-2008  ad branches: 1.241.4;
Remove #ifdef LFS from the ufs code.
 1.240 10-Nov-2008  joerg Reduce internals of WAPBL exposed to the rest of the system.
 1.239 30-Oct-2008  joerg branches: 1.239.2;
Fix indentation.
 1.238 10-Oct-2008  hannken branches: 1.238.2;
Break a deadlock where one thread has a wapbl transaction, calls VOP_GETPAGES
and wants to busy a page while another thread calls VOP_PUTPAGES on the same
vnode, takes pages busy and wants to start a wapbl transaction.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.237 23-Sep-2008  pooka Remove some of my debugging code which was not meant to be committed
in the wapbl merge.
 1.236 21-Sep-2008  freza Revert previous, pooka@ points out it's wrong.
 1.235 21-Sep-2008  freza WAPBL: in '%s: replaying log to disk' message use the path we're
trying to mount on instead of the misleading last-mounted-on
path. Reported by jmcneill.
 1.234 22-Aug-2008  hannken Add snapshot support for logging ffs file systems.

- Add UFS_WAPBL_BEGIN() / UFS_WAPBL_END() where needed.

- Expunge WAPBL log inodes from snapshots.

- Ffs_copyonwrite() and ffs_snapblkfree() must run inside a WAPBL transaction.

- Add ffs_gop_write() as a wrapper around genfs_gop_write() that makes sure
genfs_gop_write() gets always called inside a WAPBL transaction.

- Add VOP_PUTPAGES() flag PGO_JOURNALLOCKED to tag calls to VOP_PUTPAGES()
inside a WAPBL transaction.

Reviewed by: Simon Burge <simonb@netbsd.org>, Greg Oster <oster@netbsd.org>

PGO_JOURNALLOCKED / ffs_gop_write() part presented on tech-kern@.
 1.233 15-Aug-2008  hannken ffs_suspendctl: make sure everything is on disk and the on disk log is empty.
 1.232 31-Jul-2008  hannken Ffs snapshots don't work (yet) with WAPBL:
- no snapshot creation on logging file systems.
- refuse to mount logging file systems with persistent snapshots.

Ok: Simon Burge <simonb@netbsd.org>
 1.231 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.230 28-Jun-2008  rumble branches: 1.230.2;
Create sysctl entries during module initialisation and destroy them
appropriately.

Many of these file systems are now ready for modularisation.
 1.229 03-Jun-2008  hannken branches: 1.229.2;
ufs/ffs: replace calls to getblk() with ffs_getblk(). Now all buffers
have been run through copy-on-write and async mounts work again.

Fixes PR kern/38820

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.228 16-May-2008  hannken Make sure all cached buffers with valid, not yet written data have been
run through copy-on-write. Call fscow_run() with valid data where possible.

The LP_UFSCOW hack is no longer needed to protect ffs_copyonwrite() against
endless recursion.

- Add a flag B_MODIFY to bread(), breada() and breadn(). If set the caller
intends to modify the buffer returned.

- Always run copy-on-write on buffers returned from ffs_balloc().

- Add new function ffs_getblk() that gets a buffer, assigns a new blkno,
may clear the buffer and runs copy-on-write. Process possible errors
from getblk() or fscow_run(). Part of PR kern/38664.

Welcome to 4.99.63

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.227 10-May-2008  rumble Convert file systems to dynamically attach with the new module interface.
Make VFS hooks dynamic while we're here and say farewell to VFS_ATTACH and
VFS_HOOKS_ATTACH linksets.

As a consequence, most of the file systems can now be loaded as new style
modules.

Quick sanity check by ad@.
 1.226 06-May-2008  ad branches: 1.226.2;
PR kern/38141 lookup/vfs_busy acquire rwlock recursively

Simplify the mount locking. Remove all the crud to deal with recursion on
the mount lock, and crud to deal with unmount as another weirdo lock.

Hopefully this will once and for all fix the deadlocks with this. With this
commit there are two locks on each mount:

- krwlock_t mnt_unmounting. This is used to prevent unmount across critical
sections like getnewvnode(). It's only ever read locked with rw_tryenter(),
and is only ever write locked in dounmount(). A write hold can't be taken
on this lock if the current LWP could hold a vnode lock.

- kmutex_t mnt_updating. This is taken by threads updating the mount, for
example when going r/o -> r/w, and is only present to serialize updates.
In order to take this lock, a read hold must first be taken on
mnt_unmounting, and the two need to be held across the operation.

One effect of this change: previously if an unmount failed, we would make a
half hearted attempt to back out of it gracefully, but that was unlikely to
work in a lot of cases. Now while an unmount that will be aborted is in
progress, new file operations within the mount will fail instead of being
delayed. That is unlikely to be a problem though, because if the admin
requests unmount of a file system then s(he) has made a decision to deny
access to the resource.
 1.225 30-Apr-2008  ad PR kern/38135 vfs_busy/vfs_trybusy confusion

The previous fix worked, but it opened a window where mounts could have
disappeared from mountlist while the caller was traversing it using
vfs_trybusy(). Fix that.
 1.224 29-Apr-2008  ad PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.223 17-Apr-2008  hannken branches: 1.223.2; 1.223.4;
Replace get/setspecific with a void pointer in struct ufsmount. Use explicit
initialization/finalization of snapshot private data on creation/deletion
of struct ufsmount.
Snapshot mounts no longer may fail silently because kmem_alloc() fails.

Welcome to 4.99.60

Ok: Andrew Doran <ad@netbsd.org>
 1.222 30-Jan-2008  ad branches: 1.222.6;
PR kern/37706 (forced unmount of file systems is unsafe):

- Do reference counting for 'struct mount'. Each vnode associated with a
mount takes a reference, and in turn the mount takes a reference to the
vfsops.
- Now that mounts are reference counted, replace the overcomplicated mount
locking inherited from 4.4BSD with a recursable rwlock.
 1.221 28-Jan-2008  dholland Fix some race conditions in rename.
Introduce a per-FS rename lock and new vfsops to manipulate it.
Get this lock while renaming. Also add another relookup() in do_sys_rename,
which is a hack to kludge around some of the worst deficiencies of
ufs_rename.
reviewed-by: pooka (and an earlier rev by ad)
posted on tech-kern with no objections.
 1.220 25-Jan-2008  pooka Destroy extattr lock when destroying extattrs associated with the
mountpoint. Make stopping extattrs always succesful to facilitate
always being able to free resources.
 1.219 24-Jan-2008  ad specfs changes for PR kern/37717 (raidclose() is no longer called on
shutdown). There are still problems with device access and a PR will be
filed.

- Kill checkalias(). Allow multiple vnodes to reference a single device.

- Don't play dangerous tricks with block vnodes to ensure that only one
vnode can describe a block device. Instead, prohibit concurrent opens of
block devices. As a bonus remove the unreliable code that prevents
multiple file system mounts on the same device. It's no longer needed.

- Track opens by vnode and by device. Issue cdev_close() when the last open
goes away, instead of abusing vnode::v_usecount to tell if the device is
open.
 1.218 09-Jan-2008  ad Fix hangs on 'biolock' when creating a directory under / with softdep.
 1.217 07-Jan-2008  ad Fix 'panic: softdep_update_inodeblock: update failed'.
 1.216 03-Jan-2008  ad Use pool_cache.
 1.215 03-Jan-2008  pooka valloc -> vnalloc, vfree -> vnfree
Avoids collision with userland valloc(3).

no functional change
ad ok
 1.214 02-Jan-2008  ad Merge vmlocking2 to head.
 1.213 20-Dec-2007  dyoung Call genfs_node_init a little earlier to avoid a vput()ing an
uninitialized node, later, which leads to a kernel panic. Patch
by Antti Kantee.
 1.212 08-Dec-2007  pooka branches: 1.212.4;
Remove cn_lwp from struct componentname. curlwp should be used
from on. The NDINIT() macro no longer takes the lwp parameter and
associates the credentials of the calling thread with the namei
structure.
 1.211 26-Nov-2007  pooka branches: 1.211.2;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.210 10-Oct-2007  ad branches: 1.210.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.209 08-Oct-2007  ad Merge ffs locking & brelse changes from the vmlocking branch.
 1.208 09-Aug-2007  hannken branches: 1.208.2; 1.208.4;
Move snapshot per-mount data from struct ufsmount to mount specific data.
No functional changes.

Welcome to 4.99.28 (struct ufsmount changed size)
 1.207 31-Jul-2007  pooka branches: 1.207.2; 1.207.4;
* nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.206 20-Jul-2007  pooka In sync, skip over vnodes based on if they are clean rather than
if they have pages.
 1.205 17-Jul-2007  pooka branches: 1.205.2;
Make set_statvfs_info() take a parameter for the vfs name instead
of always retrieving it from mp->mnt_op->vfs_name

christos ok
 1.204 12-Jul-2007  dsl Change the VFS_MOUNT() interface so that the 'data' buffer passed to the
fs code is a kernel buffer, pass though the length of the buffer as well.
Since the length of the userspace buffer isn'it (yet) passed through the mount
system call, add a field to the vfsops structure containing the default length.
Split sys_mount() for calls from compat code.
Ride one of the recent kernel version changes - old fs LKMs will load, but
sys_mount() will reject any attempt to use them.
 1.203 10-Jul-2007  hannken Move `struct dquot' and its supporting functions from quota.h to ufs_quota.c.

- Make quota-internal functions static.
- Clean up declarations in quota.h and ufs_extern.h. quota.h now has the
description of quota criterions, on-disk structure, user-kernel interface and
declaration of init/done functions. All ufs quota related function
prototypes go to ufs_extern.h.
- New functions ufsquota_init() and ufsquota_free() create or destroy the
quota fields of `struct inode'.
- chkdq() and chkiq() always update the quota fields of `struct inode' first.
- Only ufs_access() explicitely calls getinoquota().

No objections on tech-kern@
 1.202 30-Jun-2007  pooka Using POOL_INIT here makes no sense, since file systems always have
an init method. So get rid of it and #ifdef _LKM and just always
init in the init method. Give malloc types the same treatment.
Makes file systems nicer to work with in linksetless environments
and fixes a few LKM discrepancies.
 1.201 29-May-2007  tsutsui Fix inconsistent changes in rev 1.153 and 1.154:
Adjust fs->fs_maxfilesize instead of ump->um_maxfilesize
in ffs_oldfscompat_read() because the latter is overrided
by the former after ffs_oldfscompat_read() returned.

Fixes EFBIG errors on read(2) and "exec /sbin/init: error 8"
problem on mac68k after mountroot() on old 4.3BSD UFS created
by the Mkfs tool for MacOS (reported and confirmed on port-mac68k).
 1.200 28-May-2007  ad Fix lock order inversion between vnode locks and ufs_hashlock. Addresses
kern/36331 (MP deadlock between ufs_ihashget() and VOP_LOOKUP()) for ffs,
other file systems to follow. Reported by perseant@, debugged by Sverre
Froyen, patch posted/tested by Blair Sadewitz.
 1.199 17-May-2007  hannken Fstrans_start() always returns zero, so change its type to void.
 1.198 07-Apr-2007  hannken Remove calls to now obsolete vn_start_write() and vn_finished_write().
 1.197 12-Mar-2007  ad branches: 1.197.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.196 16-Feb-2007  hannken branches: 1.196.2; 1.196.6;
Make fstrans(9) the default helper for file system suspension.
Replaces the now obsolete vn_start_write()/vn_finished_write().
 1.195 15-Feb-2007  ad Replace some uses of lockmgr() / simplelocks.
 1.194 29-Jan-2007  hannken Change fstrans enum types to upper case.
No functional change.

From Antti Kantee <pooka@netbsd.org>
 1.193 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.192 07-Jan-2007  isaki Correct indent.
 1.191 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.190 16-Nov-2006  christos branches: 1.190.2; 1.190.4;
__unused removal on arguments; approved by core.
 1.189 25-Oct-2006  reinoud Revisit mnt_vnodelist TAILQ patch. Remove all suspicious TAILQ_FOREACH()
loops where vnodes can get removed or added during the loops. This could
lead to panic's on unmount since nodes are skipped or otherwise
TAILQ_NEXT(0xdeadbeef, ...) was dereferenced.
 1.188 20-Oct-2006  reinoud Replace the LIST structure mp->mnt_vnodelist to a TAILQ structure since all
vnodes were synced and processed backwards. This meant that the last
accessed node was processed first and the earlierst last.

An extra benefit is the removal of the ugly hack from the Berkly days on
LFS.

In the proces, i've also replaced the various variations hand written loops
by the TAILQ_FOREACH() macro's.
 1.187 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.186 21-Sep-2006  jld Change ffs_mount, in MNT_UPDATE case, to check dev_t's for equality
instead of just vnode pointers. Fixes erroneous "does not match mounted
device" errors from mount(8) in the presence of MFS /dev, init.root, &c.

No objections on tech-kern.
 1.185 30-Aug-2006  christos branches: 1.185.2; 1.185.4;
fix missing initializers
 1.184 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.183 13-Jul-2006  martin Fix alignement problems for fhandle_t, exposed by gcc4.1.

While touching all vptofh/fhtovp functions, get rid of VFS_MAXFIDSIZ,
version the getfh(2) syscall and explicitly pass the size available in
the filehandle from userland.

Discussed on tech-kern, with lots of help from yamt (thanks!).
 1.182 07-Jun-2006  kardel branches: 1.182.2;
merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.181 14-May-2006  elad branches: 1.181.2;
integrate kauth.
 1.180 21-Feb-2006  thorpej branches: 1.180.2; 1.180.4; 1.180.6;
Use device_class() instead of accessing dv_class directly.
 1.179 14-Jan-2006  yamt branches: 1.179.2; 1.179.4;
- unify ffs_blkatoff and lfs_blkatoff.
- remove ufs_ops::uo_blkatoff.
- add directory read-ahead code. (disabled for now.)
 1.178 23-Dec-2005  rpaulo branches: 1.178.2;
Convert UFS_EXTATTR to struct lwp.
 1.177 11-Dec-2005  christos merge ktrace-lwp.
 1.176 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.175 27-Sep-2005  yamt branches: 1.175.2;
introduce "ufs_ops" and use it for ITIMES.
 1.174 23-Sep-2005  jmmv Apply the NFS exports list rototill patch:

- Remove all NFS related stuff from file system specific code.
- Drop the vfs_checkexp hook and generalize it in the new nfs_check_export
function, thus removing redundancy from all file systems.
- Move all NFS export-related stuff from kern/vfs_subr.c to the new
file sys/nfs/nfs_export.c. The former was becoming large and its code
is always compiled, regardless of the build options. Using the latter,
the code is only compiled in when NFSSERVER is enabled. While doing this,
also make some functions in nfs_subs.c conditional to NFSSERVER.
- Add a new command in nfssvc(2), called NFSSVC_SETEXPORTSLIST, that takes a
path and a set of export entries. At the moment it can only clear the
exports list or append entries, one by one, but it is done in a way that
allows setting the whole set of entries atomically in the future (see the
comment in mountd_set_exports_list or in doc/TODO).
- Change mountd(8) to use the nfssvc(2) system call instead of mount(2) so
that it becomes file system agnostic. In fact, all this whole thing was
done to remove a 'XXX' block from this utility!
- Change the mount*, newfs and fsck* userland utilities to not deal with NFS
exports initialization; done internally by the kernel when initializing
the NFS support for each file system.
- Implement an interface for VFS (called VFS hooks) so that several kernel
subsystems can run arbitrary code upon receipt of specific VFS events.
At the moment, this only provides support for unmount and is used to
destroy NFS exports lists from the file systems being unmounted, though it
has room for extension.

Thanks go to yamt@, chs@, thorpej@, wrstuden@ and others for their comments
and advice in the development of this patch.
 1.173 22-Sep-2005  rpaulo Fix bogus if-clause introduced in previous revision.
 1.172 22-Sep-2005  rpaulo In ffs_unmount(), detect EOPNOTSUPP errno returned from
ufs_extattr_stop().

From FreeBSD.
 1.171 12-Sep-2005  christos - access the ffs and ext2fs itimes functions through a pointer, so that
if the filesystem is not compiled in the kernel still links. Probably
a better solution is to use weak symbols.
- move the filesystem-specific itime macros to the filesystem header files.
 1.170 28-Aug-2005  thorpej Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature. I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress. There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
 1.169 23-Aug-2005  christos Don't overload MAXNAMLEN, use a separate constant for each filesystem type.
 1.168 25-Jul-2005  drochner fix crash in mount error handling: don't free storage which was not
malloc'd
 1.167 23-Jul-2005  yamt update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.
 1.166 15-Jul-2005  thorpej Use ANSI function decls.
 1.165 28-Jun-2005  yamt branches: 1.165.2;
- constify genfs_ops.
- use member designators.
 1.164 29-May-2005  christos - sprinkle const
- avoid shadow variables.
 1.163 29-Mar-2005  thorpej - Define a VFS_ATTACH() macro that places a reference to a vfsops structure
into the "vfsops" link set.
- Use VFS_ATTACH() where vfsops are declared for individual file systems.
- In vfsinit(), traverse the "vfsops" link set, rather than vfs_list_initial[].
 1.162 04-Mar-2005  christos branches: 1.162.2;
PR/26823: Michael L. Hitch: Endianness flag were not preserved in the compat
superblock read routine.
 1.161 26-Feb-2005  perry nuke trailing whitespace
 1.160 11-Jan-2005  mycroft branches: 1.160.2; 1.160.4;
Rearrange some code slightly to avoid uninitialized variable warnings.
 1.159 09-Jan-2005  mycroft Rework the mountroot interface so that vfs_mountroot() opens the root device
and just passes it on to the file system functions. This avoids opening and
closing the device several times.

Mentioned on tech-kern some time ago, IIRC. I've been running this for a
long time.
 1.158 02-Jan-2005  thorpej Add the system call and VFS infrastructure for file system extended
attributes.

From FreeBSD.
 1.157 26-Dec-2004  dbj remove opt_compat_netbsd.h, afaict it is no longer needed.
i think it was previously used to pull in COMPAT_09 for ffs_statfs
 1.156 21-Nov-2004  jdolecek allow changes of the sysctl values
 1.155 21-Sep-2004  thorpej Add a new VNODE_LOCKDEBUG option, which enables checks in the VOP_*()
calls to ensure that the vnode lock state is as expected when the VOP
call is made. Modify vnode_if.src to set the expected state according
to the documenting lock table for each VOP. Modify vnode_if.sh to emit
the checks.

Notes:
- The checks are only performed if the vnode has the VLOCKSWORK bit
set. Some file systems (e.g. specfs) don't even bother with vnode
locks, so of course the checks will fail.
- We can't actually run with VNODE_LOCKDEBUG because there are so many
vnode locking problems, not the least of which is the "use SHARED for
VOP_READ()" issue, which screws things up for the entire call chain.

Inspired by similar changes in OpenBSD, but implemented differently.
 1.154 19-Sep-2004  yamt um_maxfilesize should be set after
ffs_oldfscompat_read adjusted fs_maxfilesize.
 1.153 15-Aug-2004  mycroft Fixing age old cruft:
* Rather than using mnt_maxsymlinklen to indicate that a file systems returns
d_type fields(!), add a new internal flag, IMNT_DTYPE.

Add 3 new elements to ufsmount:
* um_maxsymlinklen, replaces mnt_maxsymlinklen (which never should have existed
in the first place).
* um_dirblksiz, which tracks the current directory block size, eliminating the
FS-specific checks littered throughout the code. This may be used later to
make the block size variable.
* um_maxfilesize, which is the maximum file size, possibly adjusted lower due
to implementation issues.

Sync some bug fixes from FFS into ext2fs, particularly:
* ffs_lookup.c 1.21, 1.28, 1.33, 1.48
* ffs_inode.c 1.43, 1.44, 1.45, 1.66, 1.67
* ffs_vnops.c 1.84, 1.85, 1.86

Clean up some crappy pointer frobnication.
 1.152 14-Aug-2004  mycroft Add a new flag, IN_MODIFY. This is like IN_UPDATE|IN_CHANGE, but unlike
setting those flags, it does not cause the inode to be written in the periodic
sync. This is used for writes to special files (devices and named pipes) and
FIFOs.

Do not preemptively sync updates to access times and modification times. They
are now updated in the inode only opportunistically, or when the file or device
is closed. (Really, it should be delayed beyond close, but this is enough to
help substantially with device nodes.)

And the most amusing part:
Trickle sync was broken on both FFS and ext2fs, in different ways. In FFS, the
periodic call to VFS_SYNC(MNT_LAZY) was still causing all file data to be
synced. In ext2fs, it was causing the metadata to *not* be synced. We now
only call VOP_UPDATE() on the node if we're doing MNT_LAZY. I've confirmed
that we do in fact trickle correctly now.
 1.151 05-Jul-2004  pk Call inittodr() from main(). Let file system code set the recorded `last
update' time (if any) through the new function setrootfstime().
 1.150 27-May-2004  hannken Fixup last commit. fs->fs_active must be initialized.
 1.149 25-May-2004  hannken Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.148 25-May-2004  atatat Sysctl descriptions under vfs subtree
 1.147 20-May-2004  atatat Explicitly call pool_init() (and pool_destroy()) when being built as
an _LKM.

This adds pools to the list of things that lkms must do manually
because they're set up with link sets. Not that there's anything
wrong with link sets, but that we need to try harder to remember that
lkms are second class citizens. Of a sort.
 1.146 26-Apr-2004  simonb Unwrap a not-too-long line.
 1.145 25-Apr-2004  dbj remove botched superblock upgrade warnings.
there are now alternate non-kernel checks and fixes for this problem.
relevent prs include:
bin/17910 kern/21283 kern/21404 port-macppc/23925 port-macppc/23926
install/25138
 1.144 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.143 21-Apr-2004  christos Replace the statfs() family of system calls with statvfs().
Retain binary compatibility.
 1.142 18-Apr-2004  dbj remove code that attempts to correct superblock location. this
enforces an unnecessary restriction that the superblock be in the
particular expected locations. Also, the compatibility case is
handled in ffs_oldfscompat_read.
 1.141 18-Apr-2004  dbj when enabling ffs compatibility in ffs_reload, use
sblockloc that superblock was read from
also note XXX that ffs_reload doesn't handle superblock moving
 1.140 27-Mar-2004  dsl branches: 1.140.2;
Rework previous so that FS_FLAGS_UPDATED is only looked at for ffsv1
 1.139 24-Mar-2004  atatat Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.138 21-Mar-2004  dsl Rework superblock validation logic to make adding validity tests easier.
Ensure that we don't use the first alternate superblock of a ffsv1
filesystem with 64k blocks (it is in the same place as an ffsv2 sb).
Fixes part of PR kern/24809
 1.137 11-Mar-2004  dbj quiet tls. change botched superblock warning to use -b 16
 1.136 10-Mar-2004  keihan s/netbsd.org/NetBSD.org/g
 1.135 22-Feb-2004  jdolecek make sblock_try[] const
 1.134 12-Jan-2004  dbj change the updating note to say you may need fsck_ffs -b 32 -c 4'
 1.133 12-Jan-2004  dbj add checks for a couple of botched superblock upgrade cases
and report a warning with repair references.
 1.132 10-Jan-2004  hannken Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue.

Cleanup ffs_sync() which did not synchronously wait when MNT_WAIT
was specified. Clear the work queue when MNT_WAIT is specified.

Result is a clean on-disk file system after ffs_sync(.., MNT_WAIT, ..)

From FreeBSD.
 1.131 09-Jan-2004  dbj never upgrade the superblock or set FS_FLAGS_UPDATED in fs_old_flags
add compatibility for filesystems created before FFSv2 integration
these patches are from pr port-macppc/23926 and should also fix
problems discussed in pr kern/21404 and pr kern/21283
 1.130 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.129 01-Dec-2003  dbj in ffs_unmount, ignore error returned by VOP_CLOSE(devvp)
this fixes a problem where device close error would cause
unmount to fail but structures to be left partially deallocated
 1.128 08-Nov-2003  dbj fix minor memory leaks in error paths of ffs_mountfs
 1.127 05-Nov-2003  hannken Clean up the usage of vn_start_write(). At least one occurence clobbered
previous error conditions.
If "(flags & (V_WAIT|V_PCATCH)) == V_WAIT" the return value is always zero.
Ignore the return value in these cases.

From Darrin B. Jewell.
 1.126 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.125 15-Oct-2003  hannken Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.124 14-Oct-2003  dbj add mnt_iflag field to struct mount for internal flags
mv MNT_GONE, MNT_UNMOUNT and MNT_WANTRDWR to this field
additonally add mnt_writeopcountupper and mnt_writeopcountlower fields
in preparation for pending write suspension support work
bump kernel version to 1.6ZD
 1.123 25-Sep-2003  enami In ffs_sbupdate(), swap the sblock after ffs_oldfscompat_write() is
applied rather than the original.
 1.122 17-Sep-2003  enami Fix a recently introduced bug which prevents csum totals being copied
when an old ffs filesytem is first mounted (as a result, df reports disk
full on old ffs filesystem or mfs created by old binary). Problem first
noticed by onoe san.
 1.121 13-Sep-2003  bouyer make sure to not get flags which are for internal use only from the on-disk
superblock.
Proposed in http://mail-index.netbsd.org/tech-kern/2003/09/06/0005.html
 1.120 13-Sep-2003  bouyer Commit changes proposed in
http://mail-index.netbsd.org/tech-kern/2003/09/06/0001.html
http://mail-index.netbsd.org/tech-kern/2003/09/06/0006.html
to avoid compat problems with old ffsv1 by reuse of the old FS_SWAPPED
value for FS_FLAGS_UPDATED, and use of new, larger fields:
- Don't use FS_FLAGS_UPDATED to see if we need to update new fields from
old fields in ffsv1 case.
- when writing back the superblock, copy back the flags to the old location
if only old flags are set (FS_FLAGS_UPDATED won't be set in this case)
in ffsv1 case.
 1.119 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.118 29-Jun-2003  fvdl branches: 1.118.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.117 29-Jun-2003  thorpej Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.116 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.115 12-Jun-2003  fvdl OS X still seems to use the old nrpos field in the superblock, and gets
unhappy after NetBSD wrote an Apple UFS filesystem. Just set it to 0
in this case.
 1.114 03-May-2003  christos make sure we update fs_fsmnt.
 1.113 16-Apr-2003  christos PR/1796: John Kohl: statfs misbehaves under chrooted environments.

- Under chroot it displays only the visible filesystems with appropriate paths.
- The statfs f_mntonname gets adjusted to contain the real path from root.
- While was there, fixed a bug in ext2fs, locking problems with vfs_getfsstat(),
and factored out some of the vfsop statfs() code to copy_statfs_info(). This
fixes the problem where some filesystems forgot to set fsid.
- Made coda look more like a normal fs.
 1.112 12-Apr-2003  fvdl Don't cache buffers used when finding the superblock, it can lead to
seeing bogus data for the first cg with certain block/frag sizes.
From enami tsugutomo.
 1.111 05-Apr-2003  fvdl * Use the old and new time fields in the superblock as well as a few others
to determine if this filesystem was mounted by an older kernel after
having been mounted by a newer one, to avoid some summary mismatches.
* Reinstate support for 4.2 cylinder groups (read-only, as it was before).
 1.110 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.109 31-Mar-2003  fvdl The modified flag must be cleared before the last sbupdate call in
unmount, because ffs_flushfiles or softdep_flushfiles may have
modified the filesystem (despite VFS_SYNC having been called first).
 1.108 21-Mar-2003  dsl Use 'void *' instead of 'caddr_t' in prototypes of VOP_IOCTL, VOP_FCNTL
and VOP_ADVLOCK, delete casts from callers (and some to copyin/out).
 1.107 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.106 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.105 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.104 24-Nov-2002  scw Quell an uninitialised variable warning.
 1.103 28-Sep-2002  dbj Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.102 21-Sep-2002  christos MNT_GETARGS support
 1.101 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.100 30-Jul-2002  soren Die, qaddr_t, die! - mnt_data in struct mount is already effectively
a void *, so stop pretending otherwise.
 1.99 09-Jun-2002  chs allow read-only mounts even if we can't read the last fragment of the fs.
this enables one to recover data from a failing disk (where the read failure
is a hardware problem) while avoiding corrupting the fs further (in the case
where the read failure is due to a misconfiguration).
 1.98 10-Apr-2002  mycroft branches: 1.98.2; 1.98.4;
Use blkstofrags() and fragstoblks(). Use &(NBBY-1) rather than %NBBY.
Switch off of fs_fragshift rather than fs_frag (generates better jump tables).
 1.97 01-Apr-2002  enami Hold an extra reference if updating and args.fspec == NULL.
 1.96 01-Apr-2002  christos Fixes from enami:

- If VOP_ACCESS fails when updating mount, we will vrele() twice.

- The check for update-only flags in mp->mnt_flag when not updating
case is bogus. If we really want to check, we need to see flags in
ufs_args, but I'm not sure if it is really necessary.

- The credential passed to ffs_reload was credential of when looking
up mount point, but now it is credential of when looking up device
node. Anyway, it may be current process's credential.
 1.95 31-Mar-2002  christos PR/16136: Chris Jepeway: Bogus entry in /etc/fstab can panic kernel.
 1.94 17-Mar-2002  chs when mounting a filesystem, read the last block in the filesystem
to verify that the device is at least as big as the superblock claims
the filesystem is supposed to be, and if it's not then fail the mount.
this should help reduce the type of confusion reported in PR 13228.
 1.93 08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.92 28-Feb-2002  pooka Don't add fs->fs_pendingblocks to f_bavail twice. It's already included
in f_bfree, which is added to f_bavail.

Fixes problem with statfs reporting too much free space for filesystems
which have files pending to be freed by softdeps.
 1.91 30-Dec-2001  fvdl XXXX temporary measure: in the case of a softdep 'unmount pending error',
do not mark the filesystem clean, as this will mean that one or more
files were likely not completely removed (will show up as unconnected
in fsck). Prevents filesystems from being marked clean while they're
not until this problem has been figured out.
 1.90 19-Dec-2001  fvdl ffs_reload may be called after an old fsck has run, and the pending*
fields may not be zero. Just reset them silently, it's not an error.
 1.89 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.88 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.87 15-Sep-2001  chs branches: 1.87.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.86 15-Sep-2001  chs add a new VFS op, vfs_reinit, which is called when desiredvnodes is
adjusted via sysctl. file systems that have hash tables which are
sized based on the value of this variable now resize those hash tables
using the new value. the max number of FFS softdeps is also recalculated.

convert various file systems to use the <sys/queue.h> macros for
their hash tables.
 1.85 06-Sep-2001  lukem branches: 1.85.2;
Incorporate the enhanced ffs_dirpref() by Grigoriy Orlov, as found in
FreeBSD (three commits; the initial work, man page updates, and a fix
to ffs_reload()), with the following differences:
- Be consistent between newfs(8) and tunefs(8) as to the options which
set and control the tuning parameters for this work (avgfilesize & avgfpdir)
- Use u_int16_t instead of u_int8_t to keep track of the number of
contiguous directories (suggested by Chuck Silvers)
- Work within our FFS_EI framework
- Ensure that fs->fs_maxclusters and fs->fs_contigdirs don't point to
the same area of memory

The new algorithm has a marked performance increase, especially when
performing tasks such as untarring pkgsrc.tar.gz, etc.

The original FreeBSD commit messages are attached:

=====
mckusick 2001/04/10 01:39:00 PDT
Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>.
His description of the problem and solution follow. My own tests show
speedups on typical filesystem intensive workloads of 5% to 12% which
is very impressive considering the small amount of code change involved.

------

One day I noticed that some file operations run much faster on
small file systems then on big ones. I've looked at the ffs
algorithms, thought about them, and redesigned the dirpref algorithm.

First I want to describe the results of my tests. These results are old
and I have improved the algorithm after these tests were done. Nevertheless
they show how big the perfomance speedup may be. I have done two file/directory
intensive tests on a two OpenBSD systems with old and new dirpref algorithm.
The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports".
The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release.
It contains 6596 directories and 13868 files. The test systems are:

1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for
test is at wd1. Size of test file system is 8 Gb, number of cg=991,
size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current
from Dec 2000 with BUFCACHEPERCENT=35

2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system
at wd0, file system for test is at wd1. Size of test file system is 40 Gb,
number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k
OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50

You can get more info about the test systems and methods at:
http://www.ptci.ru/gluk/dirpref/old/dirpref.html

Test Results

tar -xzf ports.tar.gz rm -rf ports
mode old dirpref new dirpref speedup old dirprefnew dirpref speedup
First system
normal 667 472 1.41 477 331 1.44
async 285 144 1.98 130 14 9.29
sync 768 616 1.25 477 334 1.43
softdep 413 252 1.64 241 38 6.34
Second system
normal 329 81 4.06 263.5 93.5 2.81
async 302 25.7 11.75 112 2.26 49.56
sync 281 57.0 4.93 263 90.5 2.9
softdep 341 40.6 8.4 284 4.76 59.66

"old dirpref" and "new dirpref" columns give a test time in seconds.
speedup - speed increasement in times, ie. old dirpref / new dirpref.

------

Algorithm description

The old dirpref algorithm is described in comments:

/*
* Find a cylinder to place a directory.
*
* The policy implemented by this algorithm is to select from
* among those cylinder groups with above the average number of
* free inodes, the one with the smallest number of directories.
*/

A new directory is allocated in a different cylinder groups than its
parent directory resulting in a directory tree that is spreaded across
all the cylinder groups. This spreading out results in a non-optimal
access to the directories and files. When we have a small filesystem
it is not a problem but when the filesystem is big then perfomance
degradation becomes very apparent.

What I mean by a big file system ?

1. A big filesystem is a filesystem which occupy 20-30 or more percent
of total drive space, i.e. first and last cylinder are physically
located relatively far from each other.
2. It has a relatively large number of cylinder groups, for example
more cylinder groups than 50% of the buffers in the buffer cache.

The first results in long access times, while the second results in
many buffers being used by metadata operations. Such operations use
cylinder group blocks and on-disk inode blocks. The cylinder group
block (fs->fs_cblkno) contains struct cg, inode and block bit maps.
It is 2k in size for the default filesystem parameters. If new and
parent directories are located in different cylinder groups then the
system performs more input/output operations and uses more buffers.
On filesystems with many cylinder groups, lots of cache buffers are
used for metadata operations.

My solution for this problem is very simple. I allocate many directories
in one cylinder group. I also do some things, so that the new allocation
method does not cause excessive fragmentation and all directory inodes
will not be located at a location far from its file's inodes and data.
The algorithm is:
/*
* Find a cylinder group to place a directory.
*
* The policy implemented by this algorithm is to allocate a
* directory inode in the same cylinder group as its parent
* directory, but also to reserve space for its files inodes
* and data. Restrict the number of directories which may be
* allocated one after another in the same cylinder group
* without intervening allocation of files.
*
* If we allocate a first level directory then force allocation
* in another cylinder group.
*/

My early versions of dirpref give me a good results for a wide range of
file operations and different filesystem capacities except one case:
those applications that create their entire directory structure first
and only later fill this structure with files.

My solution for such and similar cases is to limit a number of
directories which may be created one after another in the same cylinder
group without intervening file creations. For this purpose, I allocate
an array of counters at mount time. This array is linked to the superblock
fs->fs_contigdirs[cg]. Each time a directory is created the counter
increases and each time a file is created the counter decreases. A 60Gb
filesystem with 8mb/cg requires 10kb of memory for the counters array.

The maxcontigdirs is a maximum number of directories which may be created
without an intervening file creation. I found in my tests that the best
performance occurs when I restrict the number of directories in one cylinder
group such that all its files may be located in the same cylinder group.
There may be some deterioration in performance if all the file inodes
are in the same cylinder group as its containing directory, but their
data partially resides in a different cylinder group. The maxcontigdirs
value is calculated to try to prevent this condition. Since there is
no way to know how many files and directories will be allocated later
I added two optimization parameters in superblock/tunefs. They are:

int32_t fs_avgfilesize; /* expected average file size */
int32_t fs_avgfpdir; /* expected # of files per directory */

These parameters have reasonable defaults but may be tweeked for special
uses of a filesystem. They are only necessary in rare cases like better
tuning a filesystem being used to store a squid cache.

I have been using this algorithm for about 3 months. I have done
a lot of testing on filesystems with different capacities, average
filesize, average number of files per directory, and so on. I think
this algorithm has no negative impact on filesystem perfomance. It
works better than the default one in all cases. The new dirpref
will greatly improve untarring/removing/coping of big directories,
decrease load on cvs servers and much more. The new dirpref doesn't
speedup a compilation process, but also doesn't slow it down.

Obtained from: Grigoriy Orlov <gluk@ptci.ru>
=====

=====
iedowse 2001/04/23 17:37:17 PDT
Pre-dirpref versions of fsck may zero out the new superblock fields
fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause
panics if these fields were zeroed while a filesystem was mounted
read-only, and then remounted read-write.

Add code to ffs_reload() which copies the fs_contigdirs pointer
from the previous superblock, and reinitialises fs_avgf* if necessary.

Reviewed by: mckusick
=====

=====
nik 2001/04/10 03:36:44 PDT
Add information about the new options to newfs and tunefs which set the
expected average file size and number of files per directory. Could do
with some fleshing out.
=====
 1.84 02-Sep-2001  lukem Incorporate fix by iedowse @ FreeBSD to allow disks with large numbers of
cylinder groups to work correctly, with minor modifications by me to work
with our FFS_EI code. From the FreeBSD commit message:

The ffs superblock includes a 128-byte region for use by temporary
in-core pointers to summary information. An array in this region
(fs_csp) could overflow on filesystems with a very large number of
cylinder groups (~16000 on i386 with 8k blocks). When this happens,
other fields in the superblock get corrupted, and fsck refuses to
check the filesystem.

Solve this problem by replacing the fs_csp array in 'struct fs'
with a single pointer, and add padding to keep the length of the
128-byte region fixed. Update the kernel and userland utilities
to use just this single pointer.

With this change, the kernel no longer makes use of the superblock
fields 'fs_csshift' and 'fs_csmask'. Add a comment to newfs/mkfs.c
to indicate that these fields must be calculated for compatibility
with older kernels.

Reviewed by: mckusick
 1.83 17-Aug-2001  lukem remove third argument (`int ns') from ffs_sb_swap(), and let ffs_sb_swap()
determine the endianness of the `struct fs *o' superblock from o->fs_magic
and set needswap as necessary, rather than trusting the caller to get
it right. invariably, almost every caller of ffs_sb_swap() was calling it
with ns set to the wrong value for ns anyway!
ansi KNF ffs_bswap.c declarations whilst here.

this fixes all sorts of problems when trying to use other-endian file systems,
notably the kernel trying to access memory *way* off, possibly corrupting or
panicing, and userland programs SEGVing and/or corrupting things (e.g,
"fsck_ffs -B" to swap a file system endianness).

whilst the previous rev of ffs_bswap.c (1.10, 2000/12/23) made this problem
worse, i suspect that the problem was always there and previous versions
just happened not to trash things at the wrong time.

FFS_EI should now be a lot more stable.
 1.82 26-Jul-2001  lukem if printing the value of fs_clean, say 'fs_clean' instead of 'fs_flags' ...
 1.81 30-May-2001  mrg branches: 1.81.4;
use _KERNEL_OPT
 1.80 07-Feb-2001  chs branches: 1.80.2;
remove debug code that was left in by accident.
 1.79 22-Jan-2001  jdolecek make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.78 10-Jan-2001  mycroft On a RW->RO transition, explicitly clear fs_fmod after the cgupdate/sbupdate,
to prevent spurious writebacks and whinging about the (correct!) clean flag.
(Why this isn't done in ffs_sbupdate(), I dunno...)
 1.77 10-Jan-2001  chs attach the softdep pagecache pseudo-buffers to the inode
so we can find them quickly in the softdep truncate path.
 1.76 09-Jan-2001  mycroft ffs_reload(): Copy fs_ronly into the new superblock, too, as it may have been
modified on disk (e.g. by fsck(8)). This flag should really be elsewhere.
 1.75 04-Dec-2000  chs in ffs_sync(), don't skip vnodes which have (potentially dirty) pages.
 1.74 03-Dec-2000  fvdl In addition to setting the softdep flag in the superblock when
mounting with softdeps, also explicitly clear it when we don't,
so that a leftover setting after a crash will be cleared.
 1.73 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.72 13-Oct-2000  simonb There is no need to explicitly include <uvm/uvm_extern.h> for
<sys/sysctl.h> anymore.
 1.71 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.

Implement range fsync for FFS. Note: not yet implemented for the
SOFTDEP case.
 1.70 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.69 27-Jun-2000  fvdl Due to popular demand, change vinsheadfree to ungetnewvnode to make
the name clearer. No functional change.
 1.68 27-Jun-2000  fvdl In ffs_vget, do not hold ufs_haslock across the call to getnewvnode.
We may sleep in it, or even recurse, with softdeps. Instead, grab
the lock later, but check if noone else has beaten us to the VFS_VGET
operation, and if so, roll back getnewvnode using vinsheadfree, and
just return.
 1.67 16-Jun-2000  perseant branches: 1.67.2;
make it compile (fix typo)
 1.66 16-Jun-2000  matt ignore the softdep flags when mounting and there's no softdep in the kernel.
 1.65 15-Jun-2000  fvdl Allow MNT_SOFTDEP to be passed in via the mount(2) system call, do not
require it to be set via tunefs(8). Silently ignore it when doing
an update mount of a writeable filesystem, the FFS/softdep code isn't ready
for this yet.
 1.64 29-May-2000  mycroft Use LIST_{FIRST,NEXT,EMPTY}().
 1.63 29-May-2000  mycroft Add a new inode flags called IN_ACCESSED. This used in place of IN_MODIFIED
to record that the atime was updated. In ffs_update(), we only do synchronous
writes if something *other* than the atime was changed.
 1.62 04-Apr-2000  jdolecek branches: 1.62.2;
Add a new sysctl variable vfs.ffs.log_changeopt - if this is true,
an optimalization strategy change is logged into syslog. Default
is 0 (to not log). This replaces the recent not quite "right"
change to only log the change if kernel is compiled with DEBUG.
 1.61 30-Mar-2000  augustss Remove register declarations.
 1.60 30-Mar-2000  simonb Delete redundant decls of rootvp - it's in <sys/systm.h>.
Delete redundant decl of ffs_sbupdate() - it's in <ufs/ffs/ffs_extern.h>.
 1.59 16-Mar-2000  jdolecek Add new VFS op routine - vfs_done and call it on filesystem detach
in vfs_detach(). vfs_done may free global filesystem's resources,
typically those allocated in respective filesystem's init function.
Needed so those filesystems which went in via LKM have a chance to
clean after themselves before unloading.

For each leaf filesystem, add appropriate vfs_done routine.

Also remember how many times ffs_init() was called and do
the appropriate initialization on first call only. In ffs_done(),
destroy the resources when called by the last user of ffs code.
Change mfs to call ffs_init()/ffs_done() appropriately.
 1.58 16-Mar-2000  fvdl Inititalize the fs variable struct a little earlier to avoid referencing
a bad pointer in a printf. Problem reported by Krister Walfridsson.
 1.57 14-Feb-2000  fvdl Fixes to the softdep code from Ethan Solomita <ethan@geocast.com>.
* Fix buffer ordering when it has dependencies.
* Alleviate memory problems.
* Deal with some recursive vnode locks (sigh).
* Fix other bugs.
 1.56 10-Dec-1999  drochner Call ffs_oldfscompat() before all the consistency checks, to avoid the
use of uninitialized data in the checks if the filesystem is an old one.
 1.55 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.54 20-Oct-1999  enami Check if the type of device node isn't VBAD before touching v_specinfo. If
the device vnode is revoked, the field is NULL and touching it causes null
pointer derefercence.
 1.53 16-Oct-1999  wrstuden branches: 1.53.2; 1.53.4;
In spec_close(), if we're not doing a non-blocking close and VXLOCK is
not set, unlock the vnode before calling the device's close routine and
relock it after it returns. tty close routines will sleep waiting for
buffers to drain, which won't happen often times as the other side needs
to grab the vnode lock first.

Make all unmount routines lock the device vnode before calling VOP_CLOSE().
 1.52 03-Aug-1999  drochner branches: 1.52.2;
clean up inclusion of "opt_ffs.h" and use of "FFS_EI" a bit
 1.51 17-Jul-1999  wrstuden Adjust mountroot routines to vrele rootvp in case of mount error. Closes
PR 7977 by Neil Carson, <neil@brini.com>.
 1.50 08-Jul-1999  wrstuden Modify file systems to deal with struct lock in struct vnode. All leaf
fs's other than nfs use genfs_lock() for locking.

Modify lookup routines to set PDIRUNLOCK when they unlock the parrent.
 1.49 05-Mar-1999  bouyer branches: 1.49.2; 1.49.4;
Don't check fs_bsize before the superblock has been swapped if needed.
Check value of sbsize before allocating memory with this value.
 1.48 26-Feb-1999  wrstuden Modify vfsops to seperate vfs_fhtovp() into two routines. vfs_fhtovp() now
only handles the file handle to vnode conversion, and a new call,
vfs_checkexp(), performs the export verification.
 1.47 10-Feb-1999  bouyer Make sure a buffer optained from bread() is always bresle()'d in case of
error. Closes PR kern/1448 from Wolfgang Solfrank.
 1.46 04-Dec-1998  bouyer Sanity check a few values in the superblock, to avoid mallocing huge
memory area if we try to mount a corrupted filesystem. Fixes kern/3933.
 1.45 12-Nov-1998  thorpej defopt FFS_EI
 1.44 23-Oct-1998  thorpej branches: 1.44.2;
Use DINODE_SIZE rather than pointer arithmetic.
 1.43 01-Sep-1998  thorpej Use the pool allocator and the "nointr" pool page allocator for FFS inodes.

XXX MFS also comes in here for inodes, and used a different malloc type,
but the structure is the same, so we just use the FFS inode pool.
 1.42 09-Aug-1998  perry bzero->memset, bcopy->memcpy, bcmp->memcmp
 1.41 05-Jul-1998  jonathan * defopt COMPAT_{09,10,11,12,13} and COMPAT_NOMID.
TODO: revisit interaction between native compat and emul compat usage.
 1.40 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.39 22-Jun-1998  sommerfe defopt for options FIFO
 1.38 13-Jun-1998  kleink KNF, mostly of FFS_EI changes.
 1.37 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.36 08-Jun-1998  scottr Use the newly-defined opt_quota.h.
 1.35 05-Jun-1998  kleink Convert fsync vnode operator implementations and usage from the old `waitfor'
argument and MNT_WAIT/MNT_NOWAIT to `flags' and FSYNC_WAIT.
 1.34 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.33 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.32 18-Feb-1998  thorpej Place a pointer to an array of our vnodeopv_desc *'s in our vfsops
structure, for use by vfs_attach().
 1.31 16-Oct-1997  mjacob In calculating the f_bavail field, don't take 32 bit quantities and
multiply them by 90 (to be divided by 100) and expect them to be sane
for very large values (I was getting a negative 'avail' count).
 1.30 22-Jul-1997  fvdl Fix messed up RCS Id.
 1.29 07-Jul-1997  fvdl Get locking around inode hashing right.
 1.28 07-Jul-1997  fvdl Oops, I messed up the lock. Reverting it until I have time to fix it,
to avoid people getting trouble after the supscan hits.
 1.27 06-Jul-1997  fvdl Put lock around inode hashing, because getnewvnode or MALLOC might block,
creating race conditions.
 1.26 12-Jun-1997  mrg remove swap configuration.
 1.25 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.24 10-Mar-1997  mycroft Just increment the generation count. Using the time is bogus and defeats
fsirand(8).
 1.23 31-Jan-1997  thorpej branches: 1.23.4;
- Add ffs_mountroot to ffs_vfsops.
- Only attempt to mount a root FFS on a DV_DISK class device.
 1.22 22-Dec-1996  cgd branches: 1.22.2;
Change the second and third args to struct vfsops' (*vfs_mount)() to
'const char *', and 'void *', respectively. The second arg is taken directly
from user arguments, and is const there, so must be const in the prototypes
and functions. The third arg is also taken directly from user arguments.
It doesn't have to be changed, but since it's cleaner to keep the type
the same as the user arg's type, and I'm already making the 'const char *'
change...
 1.21 12-Oct-1996  christos revert previous kprintf changes
 1.20 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.19 09-Feb-1996  christos ffs prototypes
 1.18 19-Dec-1995  cgd Fix from Lite-2: when reloading the file system, save fs_maxcluster and
the old summary structure pointers, and recalculate cluster per cyl. grp.
information.
 1.17 11-Nov-1995  mycroft ffs -> ufs
 1.16 18-Jun-1995  cgd branches: 1.16.2;
don't assume the f_fsnamelen is nul-truncated or longer than MFSNAMELEN
 1.15 12-Apr-1995  mycroft Make use of the `fs_clean' field. If it was set when the file system was
mounted or upgraded to r-w, then clear it and set it again later when the
file system is unmounted or downgraded.
 1.14 09-Mar-1995  mycroft copy*str() should use size_t.
 1.13 08-Mar-1995  cgd size for copyinstr should be u_long
 1.12 18-Jan-1995  mycroft Clean up the code to frob mnt_stat a bit.
 1.11 18-Jan-1995  mycroft Turn mountlist into a CIRCLEQ, and handle setting and checking of MNT_ROOTFS
differently.
 1.10 15-Dec-1994  mycroft Call foo_statfs() from a common place when mounting.
 1.9 14-Dec-1994  mycroft Sync with CSRG.
 1.8 28-Oct-1994  mycroft This is not my day.
 1.7 28-Oct-1994  mycroft Fix typo.
 1.6 28-Oct-1994  mycroft For now, limit the maxfilesize to 2^31*bsize-1 in core. This is temporary.
 1.5 28-Oct-1994  mycroft Fix a couple of types in the compatibility code.
 1.4 29-Jun-1994  cgd branches: 1.4.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.3 28-Jun-1994  mycroft Reload mnt_maxsymlinklen, for `fsck -c2'.
 1.2 22-Jun-1994  mycroft Add a couple of missing casts.
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.4.2.1 23-Nov-1994  cgd from mycroft, for patch_05
 1.16.2.2 26-Dec-1995  mycroft Pull in ffs_reload() fix from trunk.
 1.16.2.1 01-Nov-1995  jtc complete ufs -> ffs change (From John Kohl; PR #1403)
 1.22.2.1 14-Jan-1997  thorpej Snapshot of work-in-progress, committed to private branch.

These changes implement machine-independent root device and file system
selection. Notable features:

- All ports behave in a consistent manner regarding root
device selection.
- No more "options GENERIC"; all kernels have the ability
to boot with RB_ASKNAME to select root device and file system
type.
- Root file system type can be wildcarded; a machine-independent
function will try all possible file systems for the selected
root device until one succeeds.
- If the root file system fails to mount, the operator will
be given the chance to select a new root device and file
system type, rather than having the machine simply panic.
- nfs_mountroot() no longer panics if any part of the NFS
mount process fails; it now returns an error, giving the
operator a chance to recover.
- New, more consistent, config(8) grammar. The constructs:

config netbsd swap generic
config netbsd root on nfs

have been replaced with:

config netbsd root on ? type ?
config netbsd root on ? type nfs

Additionally, the operator may select or wildcard root file
system type in the kernel configuration file:

config netbsd root on cd0a type cd9660

config(8) now requires that a "root" specification be
made. "root" may be wired down or wildcarded. "swap" and
"dump" specifications are optional, and follow previous
semantics.

- config(8) has a new "file-system" keyword, used to configure
file systems into the kernel. Eventually, this will be used
to generate the default vfssw[].

- "options NFSCLIENT" is obsolete, and is replaced by
"file-system NFS". "options NFSSERVER" still exists, since
NFS server support is independent of the NFS file system
client.

- sys/arch/<foo>/<foo>/swapgeneric.c is no longer used, and
will be removed; all information is now generated by config(8).

As of this commit, all ports except arm32 have been updated to use
the new setroot(). Only SPARC, i386, and Alpha ports have been
tested at this time. Port masters should test these changes on their
ports, and report any problems back to me.

More changes are on their way, including RB_ASKNAME support in
nfs_mountroot() (to prompt for server address and path) and, potentially,
the ability to select rarp/bootparam or bootp in nfs_mountroot().
 1.23.4.1 12-Mar-1997  is Merge in changes from Trunk
 1.44.2.1 30-May-1999  chs there's a new rule that all vnodes must call uvm_vnp_setsize()
before anyone can possibly access them, so do this in ffs_vget().
 1.49.4.3 02-Aug-1999  thorpej Update from trunk.
 1.49.4.2 04-Jul-1999  chs initialize new struct mount fields in ffs_mountfs().
 1.49.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.49.2.2 20-Dec-1999  he Pull up revision 1.56 (via patch, requested by drochner):
Fix the use of an uninitialized variable. This could be triggered
if the file system to be mounted is a pre-BSD4.4 one (which can
result in the old file system being rejected).
 1.49.2.1 18-Oct-1999  cgd pull up rev 1.53 from trunk (requested by wrstuden):
In spec_close(), call the device's close routine with the vnode
unlocked if the call might block. Force a non-blocking close if
VXLOCK is set. This eliminates a potential deadlock situation, and
should eliminate the dirty buffers on reboot issue.
 1.52.2.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.52.2.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.53.4.3 15-Nov-1999  fvdl Sync with -current
 1.53.4.2 03-Nov-1999  fvdl Give ufs_ihashget an extra argument: the flags passed to vget() for
locking. This way we can avoid locking against ourselves when
ufs_ihashget is called during the flushing of metadata. XXX

Also, comment out a VOP_FSYNC call that I think is now unneeded, and
put a diagnostic printf there to check if this still happens.
 1.53.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.53.2.5 11-Feb-2001  bouyer Sync with HEAD.
 1.53.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.53.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.53.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.53.2.1 20-Oct-1999  thorpej Sync /w trunk.
 1.62.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.67.2.8 06-Oct-2003  itojun make sure to not get flags which are for internal use only from the on-disk
superblock.
Proposed in http://mail-index.netbsd.org/tech-kern/2003/09/06/0005.html
[ticket #80, bouyer]
 1.67.2.7 25-Nov-2001  he Pull up revision 1.85 (requested by lukem):
Pull in enhanced ffs_dirpref() algorithm, which provides a
substantial performance improvement through better locality
between parent/child directories and their files, and by easing
the pressure on the buffer cache for metadata operations.
 1.67.2.6 25-Nov-2001  he Pull up revision 1.84 (requested by lukem):
Change fs_csp[] from being a fixed size to being an array sized
as required. This allows file systems with more than about 15500
cylinder groups (on 32-bit systems) to be used.
 1.67.2.5 25-Nov-2001  he Pull up revision 1.83 (requested by lukem):
Call ffs_sb_swap() with the correct arguments. Fixes problems
with using other-endian file systems.
 1.67.2.4 25-Nov-2001  he Pull up revision 1.82 (requested by lukem):
Correctly refer to fs_clean in error message.
 1.67.2.3 25-Nov-2001  he Pull up revisions 1.76,1.78 (requested by lukem):
In ffs_reload(), copy fs_ronly to the new superblock too.
Clear fs_fmod on rw->ro transition.
 1.67.2.2 14-Dec-2000  he Pull up revision 1.71 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.67.2.1 03-Jul-2000  fvdl pullup from trunk:

Fix a "locking against myself" problem; holding ufs_hashlock
across getnewvnode() could cause a recursive lock if it resulted in
recycling a vnode that was using softdeps.
 1.80.2.15 11-Dec-2002  thorpej Sync with HEAD.
 1.80.2.14 18-Oct-2002  nathanw Catch up to -current.
 1.80.2.13 17-Sep-2002  nathanw Catch up to -current.
 1.80.2.12 01-Aug-2002  nathanw Catch up to -current.
 1.80.2.11 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.80.2.10 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.80.2.9 20-Jun-2002  nathanw Catch up to -current.
 1.80.2.8 17-Apr-2002  nathanw Catch up to -current.
 1.80.2.7 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.80.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.80.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.80.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.80.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.80.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.80.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.81.4.8 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.81.4.7 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.81.4.6 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.81.4.5 16-Mar-2002  jdolecek Catch up with -current.
 1.81.4.4 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.81.4.3 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.81.4.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.81.4.1 03-Aug-2001  lukem update to -current
 1.85.2.3 01-Oct-2001  fvdl Catch up with -current.
 1.85.2.2 26-Sep-2001  fvdl * add a VCLONED vnode flag that indicates a vnode representing a cloned
device.
* rename REVOKEALL to REVOKEALIAS, and add a REVOKECLONE flag, to pass
to VOP_REVOKE
* the revoke system call will revoke all aliases, as before, but not the
clones
* vdevgone is called when detaching a device, so make it use REVOKECLONE
to get rid of all clones as well
* clean up all uses of VOP_OPEN wrt. locking.
* add a few VOPS to spec_vnops that need to do something when it's a
clone vnode (access and getattr)
* add a copy of the vnode vattr structure of the original 'master' vnode
to the specinfo of a cloned vnode. could possibly redirect getattr to
the 'master' vnode, but this has issues with revoke
* add a vdev_reassignvp function that disassociates a vnode from its
original device, and reassociates it with the specified dev_t. to be
used by cloning devices only, in case a new minor is allocated.
* change all direct references in drivers to v_devcookie and v_rdev
to vdev_privdata(vp) and vdev_rdev(vp). for diagnostic purposes
when debugging race conditions that still exist wrt. locking and
revoking vnodes.
* make the locking state of a vnode consistent when passed to
d_open and d_close (unlocked). locked would be better, but has
some deadlock issues
 1.85.2.1 18-Sep-2001  fvdl Various changes to make cloning devices possible:

* Add an extra argument (struct vnode **) to VOP_OPEN. If it is
not NULL, specfs will create a cloned (aliased) vnode during
the call, and return it there. The caller should release and
unlock the original vnode if a new vnode was returned. The
new vnode is returned locked.

* Add a flag field to the cdevsw and bdevsw structures.
DF_CLONING indicates that it wants a new vnode for each
open (XXX is there a better way? devprop?)

* If a device is cloning, always call the close entry
point for a VOP_CLOSE.


Also, rewrite cons.c to do the right thing with vnodes. Use VOPs
rather then direct device entry calls. Suggested by mycroft@

Light to moderate testing done an i386 system (arch doesn't matter
though, these are MI changes).
 1.87.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.98.4.2 24-Sep-2003  tron Pull up revision 1.121 via patch (requested by bouyer in ticket #1464):
make sure to not get flags which are for internal use only from the on-disk
superblock.
Proposed in http://mail-index.netbsd.org/tech-kern/2003/09/06/0005.html
 1.98.4.1 10-Jun-2002  tv Pull up revision 1.99 (requested by chs in ticket #227):
allow read-only mounts even if we can't read the last fragment of the fs.
this enables one to recover data from a failing disk (where the read failure
is a hardware problem) while avoiding corrupting the fs further (in the case
where the read failure is due to a misconfiguration).
 1.98.2.3 29-Aug-2002  gehenna catch up with -current.
 1.98.2.2 20-Jun-2002  gehenna catch up with -current.
 1.98.2.1 16-May-2002  gehenna Use devsw APIs for checking validity of major numbers.
 1.118.2.14 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.118.2.13 01-Apr-2005  skrll Sync with HEAD.
 1.118.2.12 08-Mar-2005  skrll Sync with HEAD.
 1.118.2.11 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.118.2.10 17-Jan-2005  skrll Sync with HEAD.
 1.118.2.9 29-Nov-2004  skrll Sync with HEAD.
 1.118.2.8 27-Oct-2004  skrll Remove the struct lwp * arguments from qsync and ufs_checkpath that are
no longer (read: were never) required.
 1.118.2.7 24-Sep-2004  skrll Sync with HEAD.
 1.118.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.118.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.118.2.4 25-Aug-2004  skrll Sync with HEAD.
 1.118.2.3 24-Aug-2004  skrll Undo part of the ktrace/lwp changes. In particular:
* Remove the "lwp *" argument that was added to vget(). Turns out
that nothing actually used it!
* Remove the "lwp *" arguments that were added to VFS_ROOT(), VFS_VGET(),
and VFS_FHTOVP(); all they did was pass it to vget() (which, as noted
above, didn't use it).
* Remove all of the "lwp *" arguments to internal functions that were added
just to appease the above.
 1.118.2.2 03-Aug-2004  skrll Sync with HEAD
 1.118.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.140.2.3 29-May-2004  tron Pull up revision 1.148 (requested by atatat in ticket #393):
Sysctl descriptions under vfs subtree
 1.140.2.2 28-Apr-2004  jmc Pullup rev 1.145 (requested by dbj in ticket #197)

Remove botched superblock upgrade warnings.
There are now alternate non-kernel checks and fixes for this problem.
PR#17910 PR#21283 PR#21404 PR#23925 PR#23926
PR#25138
 1.140.2.1 27-Apr-2004  jdc Pull up revisions 1.141-1.142 (requested by dbj in ticket #185)

Fix problems related to superblock upgrade issues which may be
experienced by -current users from 2003.
 1.160.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.160.2.1 29-Apr-2005  kent sync with -current
 1.162.2.3 30-May-2007  bouyer Pull up following revision(s) (requested by tsutsui in ticket #1798):
sys/ufs/ffs/ffs_vfsops.c: revision 1.201
Fix inconsistent changes in rev 1.153 and 1.154:
Adjust fs->fs_maxfilesize instead of ump->um_maxfilesize
in ffs_oldfscompat_read() because the latter is overrided
by the former after ffs_oldfscompat_read() returned.
Fixes EFBIG errors on read(2) and "exec /sbin/init: error 8"
problem on mac68k after mountroot() on old 4.3BSD UFS created
by the Mkfs tool for MacOS (reported and confirmed on port-mac68k).
 1.162.2.2 10-Mar-2006  tron Pull up following revision(s) (requested by drochner in ticket #1189):
sys/ufs/ffs/ffs_vfsops.c: revision 1.168
fix crash in mount error handling: don't free storage which was not
malloc'd
 1.162.2.1 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.165.2.8 04-Feb-2008  yamt sync with head.
 1.165.2.7 21-Jan-2008  yamt sync with head
 1.165.2.6 07-Dec-2007  yamt sync with head
 1.165.2.5 27-Oct-2007  yamt sync with head.
 1.165.2.4 03-Sep-2007  yamt sync with head.
 1.165.2.3 26-Feb-2007  yamt sync with head.
 1.165.2.2 30-Dec-2006  yamt sync with head.
 1.165.2.1 21-Jun-2006  yamt sync with head.
 1.175.2.2 29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.175.2.1 20-Oct-2005  yamt adapt ufs.
 1.178.2.2 01-Mar-2006  yamt sync with head.
 1.178.2.1 15-Jan-2006  yamt sync with head.
 1.179.4.3 01-Jun-2006  kardel Sync with head.
 1.179.4.2 22-Apr-2006  simonb Sync with head.
 1.179.4.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time() and use "time_second"
instead of "time.tv_sec".
 1.179.2.1 09-Sep-2006  rpaulo sync with head
 1.180.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.180.4.2 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.180.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.180.2.4 03-Sep-2006  yamt sync with head.
 1.180.2.3 11-Aug-2006  yamt sync with head
 1.180.2.2 26-Jun-2006  yamt sync with head.
 1.180.2.1 24-May-2006  yamt sync with head.
 1.181.2.1 19-Jun-2006  chap Sync with head.
 1.182.2.1 13-Jul-2006  gdamore Merge from HEAD.
 1.185.4.2 10-Dec-2006  yamt sync with head.
 1.185.4.1 22-Oct-2006  yamt sync with head
 1.185.2.3 01-Feb-2007  ad Sync with head.
 1.185.2.2 12-Jan-2007  ad Sync with head.
 1.185.2.1 18-Nov-2006  ad Sync with head.
 1.190.4.1 03-Sep-2007  wrstuden Sync w/ NetBSD-4-RC_1
 1.190.2.1 04-Jun-2007  riz Pull up following revision(s) (requested by tsutsui in ticket #686):
sys/ufs/ffs/ffs_vfsops.c: revision 1.201
Fix inconsistent changes in rev 1.153 and 1.154:
Adjust fs->fs_maxfilesize instead of ump->um_maxfilesize
in ffs_oldfscompat_read() because the latter is overrided
by the former after ffs_oldfscompat_read() returned.
Fixes EFBIG errors on read(2) and "exec /sbin/init: error 8"
problem on mac68k after mountroot() on old 4.3BSD UFS created
by the Mkfs tool for MacOS (reported and confirmed on port-mac68k).
 1.196.6.22 11-Nov-2007  hannken Add fstrans_mount() to explicitly allocate fstrans_info.
Replace remaining malloc() to kmem_alloc() in vfs_trans.c.

Ok: Andrew Doran <ad@netbsd.org>
 1.196.6.21 25-Oct-2007  ad Fix up mnt_vnodelist handling.
 1.196.6.20 23-Oct-2007  ad Sync with head.
 1.196.6.19 08-Oct-2007  ad Call fstrans_unmount().
 1.196.6.18 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.196.6.17 30-Aug-2007  ad - Mark ffs MPSAFE. There are still a few minor problems and I'm not yet
sure about the snapshot code, but by and large it's there.
- Grap ump->um_lock in a few more places.
 1.196.6.16 28-Aug-2007  ad Revert accidental change (mp->mnt_iflag |= IMNT_MPSAFE).
 1.196.6.15 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.196.6.14 20-Aug-2007  ad Sync with HEAD.
 1.196.6.13 20-Aug-2007  ad softdep locking improvements. It hangs looping in flush_inodedep_deps(),
more work required.
 1.196.6.12 29-Jul-2007  ad Add vfs_destroy() to free mount structures. The specificdata_ref was being
leaked.
 1.196.6.11 15-Jul-2007  ad Sync with head.
 1.196.6.10 17-Jun-2007  ad - Increase the number of thread priorities from 128 to 256. How the space
is set up is to be revisited.
- Implement soft interrupts as kernel threads. A generic implementation
is provided, with hooks for fast-path MD code that can run the interrupt
threads over the top of other threads executing in the kernel.
- Split vnode::v_flag into three fields, depending on how the flag is
locked (by the interlock, by the vnode lock, by the file system).
- Miscellaneous locking fixes and improvements.
 1.196.6.9 09-Jun-2007  ad Sync with head.
 1.196.6.8 08-Jun-2007  ad Sync with head.
 1.196.6.7 27-May-2007  ad ffs_sync: vp->v_data can be NULL if the vnode is being recycled.
 1.196.6.6 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.196.6.5 14-Apr-2007  ad ffs_sync: don't try to examine the inode without locking if the vnode is
being freed.
 1.196.6.4 13-Apr-2007  ad Put a per-mount lock around ffs shared data structures, excluding softdep
and quotas. Strategy lifted from FreeBSD.
 1.196.6.3 10-Apr-2007  ad Sync with head.
 1.196.6.2 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.196.6.1 13-Mar-2007  ad Sync with head.
 1.196.2.3 17-May-2007  yamt sync with head.
 1.196.2.2 15-Apr-2007  yamt sync with head.
 1.196.2.1 24-Mar-2007  yamt sync with head.
 1.197.2.1 11-Jul-2007  mjf Sync with head.
 1.205.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.207.4.2 31-Jul-2007  pooka * nuke the nameidata parameter from VFS_MOUNT(). Nobody on tech-kern
knew what it was supposed to be used for and wrstuden gave a go-ahead
* while rototilling, convert file systems which went easily to
use VFS_PROTOS() instead of manually prototyping the methods
 1.207.4.1 31-Jul-2007  pooka file ffs_vfsops.c was added on branch matt-mips64 on 2007-07-31 21:14:21 +0000
 1.207.2.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.207.2.3 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.207.2.2 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.207.2.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.208.4.1 14-Oct-2007  yamt sync with head.
 1.208.2.3 23-Mar-2008  matt sync with HEAD
 1.208.2.2 09-Jan-2008  matt sync with HEAD
 1.208.2.1 06-Nov-2007  matt sync with HEAD
 1.210.4.3 18-Feb-2008  mjf Sync with HEAD.
 1.210.4.2 27-Dec-2007  mjf Sync with HEAD.
 1.210.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.211.2.2 26-Dec-2007  ad Sync with head.
 1.211.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.212.4.3 10-Jan-2008  bouyer Sync with HEAD
 1.212.4.2 08-Jan-2008  bouyer Sync with HEAD
 1.212.4.1 02-Jan-2008  bouyer Sync with HEAD
 1.222.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.222.6.4 28-Sep-2008  mjf Sync with HEAD.
 1.222.6.3 29-Jun-2008  mjf Sync with HEAD.
 1.222.6.2 05-Jun-2008  mjf Sync with HEAD.

Also fix build.
 1.222.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.223.4.7 11-Aug-2010  yamt sync with head.
 1.223.4.6 11-Mar-2010  yamt sync with head
 1.223.4.5 16-Sep-2009  yamt sync with head
 1.223.4.4 19-Aug-2009  yamt sync with head.
 1.223.4.3 18-Jul-2009  yamt sync with head.
 1.223.4.2 04-May-2009  yamt sync with head.
 1.223.4.1 16-May-2008  yamt sync with head.
 1.223.2.2 04-Jun-2008  yamt sync with head
 1.223.2.1 18-May-2008  yamt sync with head.
 1.226.2.3 10-Oct-2008  skrll Sync with HEAD.
 1.226.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.226.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.229.2.7 28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.229.2.6 03-Jul-2008  simonb Sync with head.
 1.229.2.5 30-Jun-2008  simonb During mount, mark the filesystem as clean once we've replayed the
journal.

With much help from Greg Oster.
 1.229.2.4 12-Jun-2008  martin License police
 1.229.2.3 11-Jun-2008  simonb Fix some whitespace and long line niggles.
 1.229.2.2 11-Jun-2008  simonb Comment out the behaviour change that requires "mount -f ..." to mount
a dirty filesystem.
 1.229.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.230.2.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.230.2.1 19-Oct-2008  haad Sync with HEAD.
 1.238.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.238.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.238.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.239.2.5 25-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.239.2.4 03-Oct-2009  snj branches: 1.239.2.4.2; 1.239.2.4.6;
Pull up following revision(s) (requested by bouyer in ticket #1036):
sbin/fsck_ffs/extern.h: revision 1.25 via patch
sbin/fsck_ffs/setup.c: revision 1.88 via patch
sbin/fsck_ffs/wapbl.c: revision 1.4 via patch
sbin/tunefs/tunefs.c: revision 1.41 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.252 via patch
sys/ufs/ffs/ffs_wapbl.c: revision 1.13 via patch
Allow tunefs to clear any type of WAPBL log, not only in-filesystem
ones. Discussed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
--
Do some basic checks of the WAPBL journal, to abort the boot before the
kernel refuse to mount a filesystem read-write (booting a system
multiuser with critical filesystems read-only is bad):
Add a check_wapbl() which will check some WAPBL values in the superblock,
and try to read the journal via wapbl_replay_start() if there is one.
pfatal() if one of these fail (abort boot if in preen mode,
as "CONTINUE" otherwise). In non-preen mode the bogus journal will
be cleared.
check_wapbl() is always called if the superblock supports WAPBL.
Even if FS_DOWAPBL is not there, there could be flags asking the
kernel to clear or create a log with bogus values which would cause the
kernel refuse to mount the filesystem.
Discussed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
--
If the WAPBL journal can't be read (ffs_wapbl_replay_start() fails),
mount the filesystem anyway if MNT_FORCE is present.
This allows to still boot single-user a system with a corrupted
WAPBL on /, and so get a chance to run fsck to fix it.
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
 1.239.2.3 04-Apr-2009  snj branches: 1.239.2.3.4;
Pull up following revision(s) (requested by add in ticket #655):
sys/ufs/ffs/ffs_vfsops.c: revision 1.245 via patch
fsync:
- atime updates were not being synced.
ffs_sync:
- In some cases the sync vnode was acting like now dead /usr/sbin/update.
It was examining vnodes that it should have ignored.
- It would find dirty inodes and try to flush them. Often ffs_fsync()
cheerfully ignored the flush request due to the fsync bug. Such inodes
remained dirty and were repeatedly re-examined by the syncer until
vnode reclaim or system shutdown.
- We were marking our place in the per-mount vnode list even though in
most cases there was not flush to perform. While not a bug, this wasted
CPU cycles because a TAILQ_NEXT would have sufficed.
 1.239.2.2 27-Mar-2009  msaitoh Pull up following revision(s) (requested by ad in ticket #600):
sys/ufs/ffs/ffs_vfsops.c: revision 1.244
ffs_sync: ensure that we *do* flush atime updates periodically.
ffs_update() was eating the flag.
 1.239.2.1 24-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #490):
sys/kern/vfs_wapbl.c: revision 1.23
sys/miscfs/syncfs/sync_subr.c: revision 1.36
sys/miscfs/syncfs/sync_vnops.c: revision 1.26
sys/ufs/ffs/ffs_alloc.c: revision 1.121
sys/ufs/ffs/ffs_vfsops.c: revision 1.242
sys/ufs/ffs/ffs_vnops.c: revision 1.110
PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.239.2.4.6.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.239.2.4.2.1 28-Apr-2014  sborrill Pull up the following revisions(s) (requested by maxv in ticket #1901):
sys/kern/vfs_syscalls.c: revision 1.478, 1.480 via patch
sys/coda/coda_vfsops.c: revision 1.81
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50 via patch
sys/fs/puffs/puffs_vfsops.c: revision 1.110 via patch
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59 via patch
sys/fs/udf/udf_vfsops.c: revision 1.67
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/kern/vfs_syscalls.c: revision 1.479
sys/miscfs/nullfs/null_vfsops.c: revision 1.88 via patch
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/nfs/nfs_vfsops.c: revision 1.227
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/ufs/mfs/mfs_vfsops.c: revision 1.107

Due to missing checks in the mount syscall, and a wrong assumption on the
file systems side, the kernel could allocate an unbounded or zero-sized
memory buffer, and could dereference a NULL pointer when particular
arguments are given by a user.
 1.239.2.3.4.1 21-Apr-2010  matt sync to netbsd-5
 1.241.4.2 23-Jul-2009  jym Sync with HEAD.
 1.241.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.257.2.14 19-Nov-2010  uebayasi - Check FFS fragment size to be page-aligned too.
- Hook the new cdev_mmap() method.
 1.257.2.13 25-Oct-2010  uebayasi Fragment size doesn't need to be page-aligned.

Return EINVAL if read-only mount option is not set, other failures
reported as ENXIO.
 1.257.2.12 21-Oct-2010  uebayasi Handle XIP mount error properly.
 1.257.2.11 21-Oct-2010  uebayasi After consideration, put back "xip" mount option.

The internal behavior is totally different between with and without
the option; automatic detection and/or fall-through are not user
friendly. mount(8) returning the "xip" flag is also informative.
 1.257.2.10 07-Oct-2010  uebayasi Check filesystem's bsize/fsize are aligned to PAGE_SIZE, or fail with
ENXIO.
 1.257.2.9 26-Sep-2010  uebayasi ffs_vget: Mark XIP only for VREG vnodes.
 1.257.2.8 17-Aug-2010  uebayasi Sync with HEAD.
 1.257.2.7 27-Jul-2010  uebayasi s/DIOCGPHYSADDR/DIOCGPHYSSEG/ now that it returns struct vm_physseg *,
not paddr_t.
 1.257.2.6 28-May-2010  uebayasi Remove the "xip" option from mount_ffs(8) for simplicity.
 1.257.2.5 30-Apr-2010  uebayasi Sync with HEAD.
 1.257.2.4 28-Apr-2010  uebayasi When mounting a block device as XIP, pass registered struct vm_physseg
* as a cookie from the block device to the caller (== mount code).
struct vm_physseg * will be passed to XIP vnode pager
(genfs_do_getpages_xip()), then converted back to paddr_t.

(My future plan is to pass struct vm_physseg * back to the fault handler,
and to pmap_enter() as is.)
 1.257.2.3 23-Mar-2010  uebayasi Put run-time XIP-specific per-mount data in struct specdev, not struct mount.
 1.257.2.2 23-Feb-2010  uebayasi Check XIP mount condition more nicely.
 1.257.2.1 11-Feb-2010  uebayasi XIP hook for ffs.
 1.258.2.6 31-May-2011  rmind sync with head
 1.258.2.5 19-May-2011  rmind Implement sharing of vnode_t::v_interlock amongst vnodes:
- Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode().
- Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that.
- Use sharing in tmpfs and layerfs for underlying object.
- Simplify locking in ubc_fault().
- Sprinkle some asserts.

Discussed with ad@.
 1.258.2.4 21-Apr-2011  rmind sync with head
 1.258.2.3 05-Mar-2011  rmind sync with head
 1.258.2.2 03-Jul-2010  rmind sync with head
 1.258.2.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.263.4.3 09-Feb-2011  bouyer Support MNT_UPDATE for quota2 (especially r/o -> r/w transitions)
 1.263.4.2 08-Feb-2011  bouyer Minimal hacking to make 'options QUOTA' compile again.
 1.263.4.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.263.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.266.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.269.2.6 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.269.2.5 23-Jan-2013  yamt sync with head
 1.269.2.4 16-Jan-2013  yamt sync with (a bit old) head
 1.269.2.3 30-Oct-2012  yamt sync with head
 1.269.2.2 23-May-2012  yamt sync with head.
 1.269.2.1 17-Apr-2012  yamt sync with head
 1.271.4.3 02-Jun-2012  mrg sync to latest -current.
 1.271.4.2 05-Apr-2012  mrg sync to latest -current.
 1.271.4.1 18-Feb-2012  mrg merge to -current.
 1.275.2.5 27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1395):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.275.2.4 04-Dec-2014  snj Pull up following revision(s) (requested by manu in ticket #1196):
sys/kern/vfs_mount.c: revision 1.31
sys/ufs/ffs/ffs_vfsops.c: revision 1.302
sys/ufs/ufs/ufs_extattr.c: revision 1.44
Fix use-after-free on failed unmount with extended attribute enabled
When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.
The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart
As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.275.2.3 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.275.2.2 13-Sep-2012  riz branches: 1.275.2.2.2; 1.275.2.2.4;
Pull up following revision(s) (requested by manu in ticket #553):
sys/ufs/ffs/ffs_vfsops.c: revision 1.278
Stop extended attributes at the appropriate place so that unmount
does not fail with EBUSY on filesystem with extended attributes ensabled.
 1.275.2.1 07-May-2012  riz branches: 1.275.2.1.2;
Pull up following revision(s) (requested by chs in ticket #204):
sys/fs/sysvbfs/sysvbfs_vnops.c: revision 1.44
sys/ufs/ffs/ffs_vfsops.c: revision 1.277
sys/fs/v7fs/v7fs_vnops.c: revision 1.11
sys/ufs/chfs/chfs_vnops.c: revision 1.7
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.61
sys/miscfs/genfs/genfs_io.c: revision 1.54
sys/kern/vfs_wapbl.c: revision 1.52
sys/uvm/uvm_pager.h: revision 1.43
sys/ufs/ffs/ffs_vnops.c: revision 1.121
sys/kern/vfs_subr.c: revision 1.434
sys/fs/msdosfs/msdosfs_vnops.c: revision 1.83
sys/fs/ntfs/ntfs_vnops.c: revision 1.51
sys/fs/udf/udf_subr.c: revision 1.119
sys/miscfs/specfs/spec_vnops.c: revision 1.135
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.103
sys/fs/udf/udf_vnops.c: revision 1.71
sys/ufs/ufs/ufs_readwrite.c: revision 1.104
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
mark all wapbl I/O as BPRIO_TIMECRITICAL.
this is the second part of addressing PR 46325.
 1.275.2.2.4.2 27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1395):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.275.2.2.4.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.275.2.2.2.2 27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1395):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.275.2.2.2.1 21-Apr-2014  bouyer Pull up following revision(s) (requested by maxv in ticket #1050):
sys/ufs/chfs/chfs_vfsops.c: revision 1.11
sys/fs/unionfs/unionfs_vfsops.c: revision 1.13
sys/fs/nilfs/nilfs_vfsops.c: revision 1.16
sys/ufs/mfs/mfs_vfsops.c: revision 1.107
sys/fs/sysvbfs/sysvbfs_vfsops.c: revision 1.43
sys/ufs/ffs/ffs_vfsops.c: revision 1.297
sys/kern/vfs_syscalls.c: revision 1.478
sys/kern/vfs_syscalls.c: revision 1.479
sys/fs/puffs/puffs_vfsops.c: revision 1.110
sys/fs/cd9660/cd9660_vfsops.c: revision 1.84
sys/nfs/nfs_vfsops.c: revision 1.227
sys/fs/v7fs/v7fs_vfsops.c: revision 1.10
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.180
sys/miscfs/umapfs/umap_vfsops.c: revision 1.92
sys/fs/filecorefs/filecore_vfsops.c: revision 1.76
sys/miscfs/nullfs/null_vfsops.c: revision 1.88
sys/fs/ptyfs/ptyfs_vfsops.c: revision 1.50
sys/coda/coda_vfsops.c: revision 1.81
sys/ufs/lfs/lfs_vfsops.c: revision 1.321
sys/fs/tmpfs/tmpfs_vfsops.c: revision 1.59
sys/fs/hfs/hfs_vfsops.c: revision 1.31
sys/miscfs/overlay/overlay_vfsops.c: revision 1.61
sys/fs/union/union_vfsops.c: revision 1.72
sys/fs/ntfs/ntfs_vfsops.c: revision 1.94
sys/kern/vfs_syscalls.c: revision 1.480
sys/fs/efs/efs_vfsops.c: revision 1.25
sys/kern/vfs_syscalls.c: revision 1.482
sys/fs/msdosfs/msdosfs_vfsops.c: revision 1.107
external/cddl/osnet/dist/uts/common/fs/zfs/zfs_vfsops.c: revision 1.12
sys/miscfs/procfs/procfs_vfsops.c: revision 1.91
sys/fs/smbfs/smbfs_vfsops.c: revision 1.100
sys/fs/adosfs/advfsops.c: revision 1.70
sys/fs/udf/udf_vfsops.c: revision 1.67
Limit check for 'data_len'. Otherwise a (un)privileged user can easily
panic the system by passing a huge size.
ok christos@
An (un)privileged user can easily make the kernel dereference a NULL
pointer.
The kernel allows 'data' to be NULL; it's the fs's responsibility to
ensure that it isn't NULL (if the fs actually needs data).
ok christos@
Some fs's - like kernfs - set their vfs_min_mount_data to zero. Add a check
to prevent an (un)privileged user from requesting a zero-sized allocation
(and thus a panic).
This thing is totally buggy: 'data_len' is modified by the fs, so calling
kmem_free with it while its value has changed since the kmem_alloc is far
from being a good idea.
If the kernel figures out that something mismatches, it will panic
(typically with kernfs).
 1.275.2.1.2.1 01-Nov-2012  matt sync with netbsd-6-0-RELEASE.
 1.278.2.7 03-Dec-2017  jdolecek update from HEAD
 1.278.2.6 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.278.2.5 23-Jun-2013  tls resync from head
 1.278.2.4 25-Feb-2013  tls resync with head
 1.278.2.3 10-Feb-2013  tls Add an accessor -- ufs_maxphys() -- to check the maximum transfer size
for a given UFS mountpoint, and move the code from mount that finds
the underlying disk and resets the mountpoint max transfer size into a
utility function, ufs_update_maxphys().

Add a global serial number that counts disk property changes to which
filesystems are meant to accomodate themselves. Make ufs_maxphys()
check it. This is a sort of flag-polling interface that avoids callbacks
into the filesystem code, but will require freezing filesystems and
draining in-flight transactions before a decrease in size that is
mandatory (like attaching a disk with a smaller maximum transfer size
as a spare in a RAIDframe set), rather than "advisory", like finding
out set geometry from a RAID controller long after boot and deciding
a smaller transfer size would be optimal, can be signalled. Still, the
"advisory" case is the common one so this is progress.

Make a bit of an example of RAIDframe by making it bump this new
serial number when disks are added to the subsystem. I will attack
one of the hardware RAID drivers (probably arcmsr) next.
 1.278.2.2 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.278.2.1 12-Sep-2012  tls Initial snapshot of work to eliminate 64K MAXPHYS. Basically works for
physio (I/O to raw devices); needs more doing to get it going with the
filesystems, but it shouldn't damage data.

All work's been done on amd64 so far. Not hard to add support to other
ports. If others want to pitch in, one very helpful thing would be to
sort out when and how IDE disks can do 128K or larger transfers, and
adjust the various PCI IDE (or at least ahcisata) drivers and wd.c
accordingly -- it would make testing much easier. Another very helpful
thing would be to implement a smart minphys() for RAIDframe along the
lines detailed in the MAXPHYS-NOTES file.
 1.286.2.2 18-May-2014  rmind sync with head
 1.286.2.1 28-Aug-2013  rmind sync with head
 1.296.2.1 10-Aug-2014  tls Rebase.
 1.299.2.4 27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1210):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.299.2.3 28-Jan-2015  martin branches: 1.299.2.3.2;
Pull up following revision(s) (requested by christos in ticket #425):
sys/ufs/ufs/ufs_inode.c: revision 1.91-1.92
sys/ufs/ufs/ufs_vnops.c: revision 1.223-1.224
sys/ufs/ufs/ufs_extern.h: revision 1.76-1.77
sys/ufs/ffs/ffs_vfsops.c: revision 1.303-1.305
Add debugging for mount...
Merge some error returns
Check more errors
Restore apple ufs error handling.
Move and unify indirect block truncate algorithm into a separate function.
PR/39371: Tobias Nygren: Don't fail mounting root if WAPBL log is corrupt.
Patch from Sergio L. Pascual.
 1.299.2.2 29-Dec-2014  martin Pull up following revision(s) (requested by maxv in ticket #352):
sys/ufs/ffs/ffs_vfsops.c: revision 1.301
Limit the superblock size to SBLOCKSIZE, not MAXBSIZE. Otherwise memcpy
will read beyond the allocated buffer.
Discussed a bit on tech-kern@.
 1.299.2.1 18-Nov-2014  snj Pull up following revision(s) (requested by manu in ticket #246):
sys/kern/vfs_mount.c: revision 1.31
sys/ufs/ffs/ffs_vfsops.c: revision 1.302
sys/ufs/ufs/ufs_extattr.c: revision 1.44
Fix use-after-free on failed unmount with extended attribute enabled
When unmount failed, for instance because the mount is still busy,
UFS1 extended attributes structures were left freed while the kernel
assumes extended attributes were still enabled. This led to using
UFS1 extended attributes structures after free. With LOCKDEBUG, with
quickly triggers a panic.
The problem is fixed by:
1) clear MNT_EXTATTR flag after extended attributes structures are freed
2) attempt to restart extended attributes after failed unmount
2) set MNT_EXTATTR correctly after extended attributes restart
As a side effect, extended attribute structures are now only initialized
when extended attributes are started for the filesystem.
 1.299.2.3.2.1 27-Aug-2016  bouyer Pull up following revision(s) (requested by martin in ticket #1210):
sys/ufs/ffs/ffs_vfsops.c: revision 1.340
usr.sbin/quot/quot.c: revision 1.34
sbin/fsdb/fsdb.c: revision 1.49
From Michael Plass:
The superblock field that distinguishes between 4.2BSD and 4.4BSD
inodes is really only relevant on a UFS1 file system. Make sure that
it is a UFS1 fs before using fs_old_inodefmt.
Note that the NetBSD newfs and mkfs utilities initialize fs_old_inodefmt
even for UFS2, so problems were apparent only on file systems created
by other operating systems, for example, FreeBSD.
 1.302.2.9 28-Aug-2017  skrll Sync with HEAD
 1.302.2.8 05-Feb-2017  skrll Sync with HEAD
 1.302.2.7 05-Dec-2016  skrll Sync with HEAD
 1.302.2.6 05-Oct-2016  skrll Sync with HEAD
 1.302.2.5 09-Jul-2016  skrll Sync with HEAD
 1.302.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.302.2.3 22-Sep-2015  skrll Sync with HEAD
 1.302.2.2 06-Jun-2015  skrll Sync with HEAD
 1.302.2.1 06-Apr-2015  skrll Sync with HEAD
 1.339.2.7 26-Apr-2017  pgoyette Sync with HEAD
 1.339.2.6 20-Mar-2017  pgoyette Sync with HEAD
 1.339.2.5 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.339.2.4 04-Nov-2016  pgoyette Sync with HEAD
 1.339.2.3 06-Aug-2016  pgoyette Sync with HEAD
 1.339.2.2 21-Jul-2016  pgoyette Actually save the bdev value when it is retrieved, so we can use it
later in a call to bdevsw_release().
 1.339.2.1 20-Jul-2016  pgoyette Adapt machine-independant code to the new {b,c}devsw reference-counting
(using localcount(9)). All callers of {b,c}devsw_lookup() now call
{b,c}devsw_lookup_acquire() which retains a reference on the 'struct
{b,c}devsw'. This reference must be released by the caller once it is
finished with the structure's content (or other data that would disappear
if the 'struct {b,c}devsw' were to disappear).
 1.342.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.353.4.3 28-Nov-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1921):

sys/ufs/ffs/ffs_vfsops.c: revision 1.382

ffs_sync: Avoid unlocked access to v_numoutput/v_dirtyblkhd.

Found by lockdoc.

PR kern/57606
 1.353.4.2 11-Apr-2018  martin Pull up following revision(s) (requested by christos in ticket #738):

sys/ufs/ffs/ffs_vfsops.c: revision 1.355

PR/52728: Izumi Tsutsui: "mount -u /dev/ /" triggers kernel panic

Simplify the control flow of the mount code and make sure that the
mountfrom argument can be converted to a block device in the update
case.
 1.353.4.1 04-Feb-2018  martin Pull up following revision(s) (requested by christos in ticket #523):
sys/ufs/ffs/ffs_vfsops.c: revision 1.356
sys/ufs/ufs/ufs_inode.c: revision 1.103
Make sure inode blocks and size are zero when VOP_INACTIVE()
finalises a now unlinked inode.
Counterpart of the check in ffs_newvnode().
Prevent use-after-free where genfs_node_destroy() would destroy
a lock residing in the just freed inode data.
 1.353.2.1 27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.356.2.4 18-Jan-2019  pgoyette Synch with HEAD
 1.356.2.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.356.2.2 28-Jul-2018  pgoyette Sync with HEAD
 1.356.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.357.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.357.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.357.2.1 10-Jun-2019  christos Sync with HEAD
 1.362.4.5 29-Feb-2020  ad Sync with head.
 1.362.4.4 24-Jan-2020  ad - Put all the namecache stuff back into vnode_impl_t.
- Tidy vfs_cache.c up, finish the comments.
- Finalise how ID information is entered to the cache.
- Handle very small/old systems.
 1.362.4.3 19-Jan-2020  ad Set IMNT_SHRLOOKUP and use it for the in-cache case. Need to check what
more can be done with tmpfs though, it can probably do the whole lookup.
 1.362.4.2 17-Jan-2020  ad vfs_lookup:

- Do the easy component name lookups directly in the namecache without
taking vnode locks nor vnode references (between the start and the leaf /
parent), which seems to largely solve the lock contention problem with
namei(). It needs support from the file system, which has to tell the
name cache about directory permissions (only ffs and tmpfs tried so far),
and I'm not sure how or if it can work with layered file systems yet.
Work in progress.

vfs_cache:

- Make the rbtree operations more efficient: inline the lookup, and key on a
64-bit hash value (32 bits plus 16 bits length) rather than names.

- Take namecache stuff out of vnode_impl, and take the rwlocks, and put them
all together an an nchnode struct which is mapped 1:1: with vnodes. Saves
memory and nicer cache profile.

- Add a routine to help vfs_lookup do its easy component name lookups.

- Report some more stats.

- Tidy up the file a bit.
 1.362.4.1 17-Jan-2020  ad Sync with head.
 1.362.2.2 07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1934):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383 (patch)
sys/ufs/ffs/ffs_vfsops.c: revision 1.384 (patch)

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.362.2.1 28-Nov-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1770):

sys/ufs/ffs/ffs_vfsops.c: revision 1.382

ffs_sync: Avoid unlocked access to v_numoutput/v_dirtyblkhd.

Found by lockdoc.

PR kern/57606
 1.378.2.4 07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1037):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_vfsops.c: revision 1.384

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.378.2.3 18-Oct-2023  martin Pull up following revision(s) (requested by riastradh in ticket #424):

sys/ufs/ffs/ffs_vfsops.c: revision 1.382

ffs_sync: Avoid unlocked access to v_numoutput/v_dirtyblkhd.

Found by lockdoc.
PR kern/57606
 1.378.2.2 21-Jun-2023  martin Pull up following revision(s) (requested by hannken in ticket #197):

sys/ufs/ffs/ffs_vfsops.c: revision 1.381
sys/dev/raidframe/rf_netbsdkintf.c: revision 1.412

Undo unlock/relock for VOP_IOCTL().
PR kern/57450 (unplugging hung USB disk triggers panic via _vstate_assert)
 1.378.2.1 21-Dec-2022  martin Pull up following revision(s) (requested by chs in ticket #17):

sys/ufs/ffs/ffs_vfsops.c: revision 1.379

ffs: fail mounts requesting ACLs for non-ea UFS2 file systems

For non-ea UFS2 file system, fail mounts that request ACLs rather than
letting the mount succeed only to reject all ACL operations later.

Also fix the messages about the on-disk fs flags conflicting with
the mount options for which type of ACLs to use, and about requesting
both types of ACLs.
 1.382.6.1 02-Aug-2025  perseant Sync with HEAD
 1.138 14-Dec-2021  chs ffs: support extattrs (and thus ACLs) on fifos.
 1.137 18-Jul-2021  dholland Abolish all the silly indirection macros for initializing vnode ops tables.

These are things of the form #define foofs_op genfs_op, or #define
foofs_op genfs_eopnotsupp, or similar. They serve no purpose besides
obfuscation, and have gotten cutpasted all over everywhere.
 1.136 18-Jul-2021  dholland Use macros for the canned parts of device and fifo vnode op tables.

Add GENFS_SPECOP_ENTRIES and GENFS_FIFOOP_ENTRIES macros that contain
the portion of the vnode ops table declaration that is
(conservatively) the same in every fs. Use these in every fs that
supports devices and/or fifos with separate ops tables.

Note that ptyfs works differently (it has one type of vnode with
open-coded dispatch to the specfs code, which I haven't changed in
this commit) and rump/librump/rumpvfs/rumpfs.c has an indirect dynamic
dispatch that already does more or less the same thing, which I also
haven't changed.

Also note that this anticipates a few bits in the next changeset here
and there, and adds missing but unreachable calls in some cases (e.g.
most fses weren't defining whiteout on devices and fifos, but it isn't
reachable there), and it changes parsepath on devices and fifos to
genfs_badop from genfs_parsepath (but it's not reachable there
either).

It appears that devices in kernfs were missing kqfilter, so it's
possible that if you try to use kqueue on /kern/rootdev that it'll
explode.

And finally note that the ops declaration tables aren't
order-dependent. (Other than vop_default_desc has to come first.)
Otherwise this wouldn't work.
 1.135 14-Jul-2021  christos Hook up ffsext_strategy to fifos. Pointed out by dholland@
 1.134 29-Jun-2021  dholland - Add a new vnode op: VOP_PARSEPATH.
- Move namei_getcomponent to genfs_vnops.c and call it genfs_parsepath.
- Add a parsepath entry to every vnode ops table.

VOP_PARSEPATH takes a directory vnode to be searched and a complete
following path and chooses how much of that path to consume. To begin
with, all parsepath calls are genfs_parsepath, which locates the first
'/' as always.

Note that the call doesn't take the whole struct componentname, only
the string. The other bits of struct componentname should not be
needed and there's no reason to cause potential complications by
exposing them.
 1.133 05-Sep-2020  riastradh branches: 1.133.6;
Round of uvm.h cleanup.

The poorly named uvm.h is generally supposed to be for uvm-internal
users only.

- Narrow it to files that actually need it -- mostly files that need
to query whether curlwp is the pagedaemon, which should maybe be
exposed by an external header.

- Use uvm_extern.h where feasible and uvm_*.h for things not exposed
by it. We should split up uvm_extern.h but this will serve for now
to reduce the uvm.h dependencies.

- Use uvm_stat.h and #ifdef UVMHIST uvm.h for files that use
UVMHIST(ubchist), since ubchist is declared in uvm.h but the
reference evaporates if UVMHIST is not defined, so we reduce header
file dependencies.

- Make uvm_device.h and uvm_swap.h independently includable while
here.

ok chs@
 1.132 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.131 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.130 23-Feb-2020  ad branches: 1.130.4;
UVM locking changes, proposed on tech-kern:

- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock.
- Break v_interlock and vmobjlock apart. v_interlock remains a mutex.
- Do partial PV list locking in the x86 pmap. Others to follow later.
 1.129 26-May-2017  riastradh branches: 1.129.10; 1.129.16;
Make VOP_RECLAIM do the last unlock of the vnode.

VOP_RECLAIM naturally has exclusive access to the vnode, so having it
locked on entry is not strictly necessary -- but it means if there
are any final operations that must be done on the vnode, such as
ffs_update, requiring exclusive access to it, we can now kassert that
the vnode is locked in those operations.

We can't just have the caller release the last lock because some file
systems don't use genfs_lock, and require the vnode to remain valid
for VOP_UNLOCK to work, notably unionfs.
 1.128 02-Mar-2017  christos ifdef reduction
 1.127 01-Mar-2017  hannken Make compile again without "options WAPBL".

From John D. Baker via current-users@, slightly modified by me.
 1.126 01-Mar-2017  hannken Remove now redundant calls to fstrans_start()/fstrans_done().
 1.125 25-Jul-2014  dholland branches: 1.125.4; 1.125.8; 1.125.12;
Add VOP_FALLOCATE and VOP_FDISCARD to every vnode ops table I can
find.

The filesystem ones all call genfs_eopnotsupp - right now I am only
implementing the plumbing and we can implement fallocate and/or
fdiscard for files later.

The device ones call spec_fallocate (which is also genfs_eopnotsupp)
and spec_fdiscard, which dispatches to the device-level op.

The fifo ones all call vn_fifo_bypass, which also ends up being
EOPNOTSUPP.
 1.124 24-Mar-2014  hannken branches: 1.124.2;
- Make VI_XLOCK, VI_CLEAN and VI_LOCKSHARE private to kern/vfs_*.c.
- Make vwait() static.
- Add vdead_check() to check a vnode for being or becoming dead.

Discussed on tech-kern.

Welcome to 6.99.38
 1.123 23-Jun-2013  dholland branches: 1.123.2;
Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.122 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.121 29-Apr-2012  chs branches: 1.121.2;
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
 1.120 27-Jun-2011  manu branches: 1.120.2; 1.120.6; 1.120.8;
Implement extended attribute listing for UFS1.

Modify lsextattr(8) so that it does not expect each attribute name to be
prefixed by its length. This enable extattr_list_(file|link|fd) to
return a buffer matching its documentation. This also makes the interface
similar to what Linux and FUSE do, which is nice for interoperability.

Note that since we had no EA implementation supporting listing, we do
not break anything.
 1.119 12-Jun-2011  rmind Welcome to 5.99.53! Merge rmind-uvmplock branch:

- Reorganize locking in UVM and provide extra serialisation for pmap(9).
New lock order: [vmpage-owner-lock] -> pmap-lock.

- Simplify locking in some pmap(9) modules by removing P->V locking.

- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share
the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).

- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner.
Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.

- Unify /dev/mem et al in MI code and provide required locking (removes
kernel-lock on some ports). Also, avoid cache-aliasing issues.

Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches
formed the core changes of this branch.
 1.118 27-Apr-2011  hannken branches: 1.118.2;
Cleanup ffs fsync and make devices on wapbl enabled file systems work here:

- Replace the ugly sync loop in ffs_full_fsync() and ffs_vfs_fsync() with
vflushbuf(). This loop is a relic of softdeps and not needed anymore.

- Add ffs_spec_fsync() for device nodes on ffs file systems that calls
spec_fsync() like all other file systems do and then updates the ctime.

Discussed on tech-kern.

Should fix PRs:
PR #41192 wapbl diagnostic panic during cgdconfig
PR #41977 kernel diagnostic assertion "rw_lock_held(&wl->wl_rwlock)" failed
PR #42149 wapbl locking panic if watching DVD
PR #42551 Lockdebug assert in wapbl when running zpool
 1.117 15-Apr-2011  hannken ffs_fsync: no need for wapbl_vptomp() here -- vnode is always VREG.
 1.116 12-Aug-2010  hannken branches: 1.116.2;
ffs_reclaim: don't free an already free inode. This may happen when
ffs_fhtovp() gets a free inode and releases it.
 1.115 28-Jul-2010  hannken ext2fs,ffs: free on disk inodes in the reclaim routine.
Remove now unneeded vnode flag VI_FREEING.

Welcome to 5.99.38.

Ok: Andrew Doran <ad@netbsd.org>
 1.114 29-Mar-2010  pooka Stop exposing fifofs internals and leave only fifo_vnodeop_p visible.
 1.113 04-Nov-2009  hannken branches: 1.113.2; 1.113.4;
Now that softdep has left the tree the only place needing the ffs_lock()
hack is ffs_sync().

- Use the generic lock operations for ffs.
- Change ffs_sync() to omit the vnode lock while suspending.

Reviewed by: Antti Kantee <pooka@netbsd.org>
 1.112 29-Mar-2009  ad fsync:

- atime updates were not being synced.

ffs_sync:

- In some cases the sync vnode was acting like now dead /usr/sbin/update.
It was examining vnodes that it should have ignored.

- It would find dirty inodes and try to flush them. Often ffs_fsync()
cheerfully ignored the flush request due to the fsync bug. Such inodes
remained dirty and were repeatedly re-examined by the syncer until
vnode reclaim or system shutdown.

- We were marking our place in the per-mount vnode list even though in
most cases there was not flush to perform. While not a bug, this wasted
CPU cycles because a TAILQ_NEXT would have sufficed.
 1.111 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.110 22-Feb-2009  ad PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc

- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.

- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.

- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.

- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.

- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.

- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.109 01-Feb-2009  ad branches: 1.109.2;
PR kern/40469 5.0_BETA/amd64 INSTALL kernel panics when installing on log-enabled filesystems
PR kern/40470 WAPBL corrupts ext2fs

Don't touch inodes at all unless VOP_FSYNC(). Might fix the ext2fs problem,
I am not sure.
 1.108 28-Dec-2008  christos Don't try to ffs_update VT_NON vnodes
 1.107 22-Dec-2008  ad Add a comment.
 1.106 22-Dec-2008  ad PR kern/40246 current panics when removing swap devices

Someone was smoking crack when they decided to unconditionally OR FSYNC_VFS
into the flags for block devices.
 1.105 21-Dec-2008  ad PR kern/40210 5.0 BETA WAPBL related crash
 1.104 10-Oct-2008  hannken branches: 1.104.2; 1.104.4;
Break a deadlock where one thread has a wapbl transaction, calls VOP_GETPAGES
and wants to busy a page while another thread calls VOP_PUTPAGES on the same
vnode, takes pages busy and wants to start a wapbl transaction.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>
 1.103 22-Aug-2008  hannken Add snapshot support for logging ffs file systems.

- Add UFS_WAPBL_BEGIN() / UFS_WAPBL_END() where needed.

- Expunge WAPBL log inodes from snapshots.

- Ffs_copyonwrite() and ffs_snapblkfree() must run inside a WAPBL transaction.

- Add ffs_gop_write() as a wrapper around genfs_gop_write() that makes sure
genfs_gop_write() gets always called inside a WAPBL transaction.

- Add VOP_PUTPAGES() flag PGO_JOURNALLOCKED to tag calls to VOP_PUTPAGES()
inside a WAPBL transaction.

Reviewed by: Simon Burge <simonb@netbsd.org>, Greg Oster <oster@netbsd.org>

PGO_JOURNALLOCKED / ffs_gop_write() part presented on tech-kern@.
 1.102 12-Aug-2008  hannken Deny read/write access to snapshot vnodes. We use fss(4) to read from
snapshots. With this policy in place:

- Separate the snapshot vnode lock from the snapshot common lock.
Snapshots no longer need recursive vnode locks.

- Use a mutex (si_snaplock) to serialize creation, deletion, reading and
writing of snapshots.

- Move ffs_read() for snapshots into ffs_snapshot.c.

Reviewed by: Jason Thorpe <thorpej@netbsd.org>

While here change ffs_copyonwrite() to fail requests from pagedaemon that need
to copy-on-write.
 1.101 31-Jul-2008  oster Make MSDOS filesystems work again after WAPBL merge. Fixes a quite
repeatable panic in fstrans_getstate() found while searching for a
different USB bug. Also makes the code somewhat more readable.

Patch from Juergen Hannken-Illjes with a small rearrangement from me.

Approved by: hannken
 1.100 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.99 29-Apr-2008  ad branches: 1.99.2; 1.99.4; 1.99.6;
PR kern/38057 ffs makes assuptions about devvp file system
PR kern/33406 softdeps get stuck in endless loop

Introduce VFS_FSYNC() and call it when syncing a block device, if it
has a mounted file system.
 1.98 30-Jan-2008  ad branches: 1.98.6; 1.98.8; 1.98.10;
Replace struct lock on vnodes with a simpler lock object built on
krwlock_t. This is a step towards removing lockmgr and simplifying
vnode locking. Discussed on tech-kern.
 1.97 25-Jan-2008  ad Remove VOP_LEASE. Discussed on tech-kern.
 1.96 09-Jan-2008  ad Go back to freeing on disk inodes in the inactive routine. It would be
better not to do this, but it rules out potential side effects with softdep.
 1.95 03-Jan-2008  ad Use pool_cache.
 1.94 02-Jan-2008  ad Merge vmlocking2 to head.
 1.93 26-Nov-2007  pooka branches: 1.93.2; 1.93.6;
Remove the "struct lwp *" argument from all VFS and VOP interfaces.
The general trend is to remove it from all kernel interfaces and
this is a start. In case the calling lwp is desired, curlwp should
be used.

quick consensus on tech-kern
 1.92 10-Oct-2007  ad branches: 1.92.4;
Merge from vmlocking:

- Split vnode::v_flag into three fields, depending on field locking.
- simple_lock -> kmutex in a few places.
- Fix some simple locking problems.
 1.91 21-Aug-2007  hannken branches: 1.91.2; 1.91.4;
Modify ffs_lock() to take care for changed v_vnlock. Snapshots do not need
transferlockers() anymore.

From FreeBSD ffs_vnops.c Rev. 1.159

Reviewed by: YAMAMOTO Takashi <yamt@netbsd.org>
 1.90 09-Aug-2007  hannken Move the fstrans-aware lock vnops from ufs to ffs. Other ufs file systems
do not need them.

Ride on 4.99.28
 1.89 20-Jul-2007  pooka branches: 1.89.4; 1.89.6;
In sync, skip over vnodes based on if they are clean rather than
if they have pages.
 1.88 05-Jun-2007  yamt branches: 1.88.2;
improve post-ubc file overwrite performance in common cases.
ie. when it's safe, actually overwrite blocks rather than doing
read-modify-write.

also fixes PR/33152 and PR/36303.
 1.87 17-May-2007  hannken Fstrans_start() always returns zero, so change its type to void.
 1.86 20-Feb-2007  ad branches: 1.86.4; 1.86.6;
Call genfs_node_destroy() where appropriate.
 1.85 29-Jan-2007  hannken branches: 1.85.2;
Change fstrans enum types to upper case.
No functional change.

From Antti Kantee <pooka@netbsd.org>
 1.84 19-Jan-2007  hannken New file system suspension API to replace vn_start_write and vn_finished_write.
The suspension helpers are now put into file system specific operations.
This means every file system not supporting these helpers cannot be suspended
and therefore snapshots are no longer possible.

Implemented for file systems of type ffs.

The new API is enabled on a kernel option NEWVNGATE. This option is
not enabled by default in any kernel config.

Presented and discussed on tech-kern with much input from
Bill Studenmund <wrstuden@netbsd.org> and YAMAMOTO Takashi <yamt@netbsd.org>.

Welcome to 4.99.9 (new vfs op vfs_suspendctl).
 1.83 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.82 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.81 23-Jul-2006  ad branches: 1.81.4; 1.81.6;
Use the LWP cached credentials where sane.
 1.80 14-May-2006  elad integrate kauth.
 1.79 09-Apr-2006  yamt ffs_gop_size: revert a problematic part of 1.78.
problems reported by Kouichirou Hiratsuka and Jukka Salmi on current-users@.
 1.78 30-Mar-2006  yamt some cleanups after the introduction of GOP_SIZE_MEM flag.
- remove GOP_SIZE_READ/GOP_SIZE_WRITE flags.
they have not been used since the change.
- ufs_balloc_range: remove code which has been no-op since the change.
thanks Konrad Schroder for explaining the original intention of the code.
- ffs_gop_size: don't extend past eof, in the case of GOP_SIZE_MEM.
otherwise genfs_getpages end up to allocate pages past eof unnecessarily.
 1.77 11-Dec-2005  christos branches: 1.77.4; 1.77.6; 1.77.8; 1.77.10; 1.77.12;
merge ktrace-lwp.
 1.76 02-Nov-2005  yamt merge yamt-vop branch. remove following VOPs.

VOP_BLKATOFF
VOP_VALLOC
VOP_BALLOC
VOP_REALLOCBLKS
VOP_VFREE
VOP_TRUNCATE
VOP_UPDATE
 1.75 09-Sep-2005  yamt branches: 1.75.2;
revert the code to expand putpage requests to block boundary.
because:
- it was incomplete in some cases.
- it can confuse pagedaemon.
see PR/15364 for details.
 1.74 30-Aug-2005  xtraeme * Remove __P()
* Use ANSI function declarations on ext2fs and mfs
 1.73 28-Aug-2005  thorpej Experimental support for extended attributes on UFS1 file systems, using a
backing file per attribute type indexed by inode number to hold the extended
attributes.

This is working pretty well on my test systems, except for the "autostart"
feature. I need someone with a better handle on the VFS locking protocol
to go over that.

This is a work-in-progress. There are parts of this that could be re-factored
allowing this approach to be used on other types of file systems.

Adapted from FreeBSD.
 1.72 26-Jul-2005  yamt revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.71 21-Jul-2005  yamt ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.
 1.70 15-Jul-2005  thorpej Use ANSI function decls.
 1.69 26-Feb-2005  perry branches: 1.69.2; 1.69.4;
nuke trailing whitespace
 1.68 27-Jan-2005  wrstuden Fix pasto in previous. We only perform the DIOCCACHESYNC call if
FSYNC_CACHE is set, not if FSYNC_WAIT is set.
 1.67 25-Jan-2005  wrstuden Extend fsync_range(2) to support the FDISKSYNC flag, which requests
that the sync be propogated out through the disk drive caches.
 1.66 15-Nov-2003  thorpej branches: 1.66.8; 1.66.10;
Kernel portion of the fsync_range(2) system call. Written by Bill
Studenmund, and contributed by Wasabi Systems, Inc.
 1.65 08-Nov-2003  jdolecek fix uninitialized variable use in previous change (!)
 1.64 08-Nov-2003  dbj always do a full fsync if vp->v_type != VREG
in partial fsync, only use PGO_SYNCIO if FSYNC_WAIT is specified
 1.63 08-Nov-2003  dbj protect use of buf's b_flags with b_interlock
 1.62 08-Nov-2003  dbj protect a few uses of buf's b_flags with b_interlock
 1.61 25-Oct-2003  kleink Remove the present incarnation of FSYNC_DATAONLY use from ffs_fsync() and
ffs_full_fsync(); while it is supposed to hint that the update of _file_
metadata (as in timestamps et al.) may be omitted it doesn't mean the
same for _filesystem_ metadata.
 1.60 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.59 29-Jun-2003  fvdl branches: 1.59.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.58 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.57 16-Apr-2003  fvdl ffs_reclaim may be called while the dinode pointer in the inode structure
is still NULL (in the case of an error in ffs_vget). Check for this
condition before doing a pool_put.
 1.56 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.55 17-Feb-2003  perseant Add code to UBCify LFS. This is still behind "#ifdef LFS_UBC" for now
(there are still some details to work out) but expect that to go
away soon. To support these basic changes (creation of lfs_putpages,
lfs_gop_write, mods to lfs_balloc) several other changes were made, to
wit:

* Create a writer daemon kernel thread whose purpose is to handle page
writes for the pagedaemon, but which also takes over some of the
functions of lfs_check(). This thread is started the first time an
LFS is mounted.

* Add a "flags" parameter to GOP_SIZE. Current values are
GOP_SIZE_READ, meaning that the call should return the size of the
in-core version of the file, and GOP_SIZE_WRITE, meaning that it
should return the on-disk size. One of GOP_SIZE_READ or
GOP_SIZE_WRITE must be specified.

* Instead of using malloc(...M_WAITOK) for everything, reserve enough
resources to get by and use malloc(...M_NOWAIT), using the reserves if
necessary. Use the pool subsystem for structures small enough that
this is feasible. This also obsoletes LFS_THROTTLE.

And a few that are not strictly necessary:

* Moves the LFS inode extensions off onto a separately allocated
structure; getting closer to LFS as an LKM. "Welcome to 1.6O."

* Unified GOP_ALLOC between FFS and LFS.

* Update LFS copyright headers to correct values.

* Actually cast to unsigned in lfs_shellsort, like the comment says.

* Keep track of which segments were empty before the previous
checkpoint; any segments that pass two checkpoints both dirty and
empty can be summarily cleaned. Do this. Right now lfs_segclean
still works, but this should be turned into an effectless
compatibility syscall.
 1.54 05-Feb-2003  pk Make the buffer cache code MP-safe.
 1.53 29-Jan-2003  simonb Remove variable that is only assigned to but not referenced.
 1.52 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.51 01-Nov-2002  kristerw Removed unused variables doclusterread and doclusterwrite.
 1.50 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.49 05-May-2002  chs for softdep vnodes, always write together the pages for any block that
might have a dependency , since the accounting doesn't work otherwise.
fixes PRs 15364 16336 16448.
 1.48 31-Dec-2001  thorpej Do not compare an integer to NULL.
 1.47 27-Dec-2001  fvdl The softdep code sometimes use vfs_vget .. vput. For removals, these
would result in a vop_inactive call for the vnode each time, resulting
in vinvalbuf->fsync. The original softdep code avoided the fsync
in vinvalbuf by not calling it if there were no dirty blocks. This
was changed in NetBSD. Also, flush_inodedeps was changed to mark
the inode as modified so that it would do an inode update and flush the
last one. This combination basically caused a sync write for each removed
file in an rm -rf (showing up delayed from the syncer a lot of the time).

If called from vinvalbuf (FSYNC_RECLAIM), and there were no dirty blocks
or pages to begin with, still do everything as normal, so that possible dirty
blocks in transit to disk are properly waited for, etc, but don't pass
UPDATE_WAIT to VOP_UPDATE, since there is no need for it in that case.
 1.46 08-Nov-2001  chs call VOP_PUTPAGES() directly for vnodes instead of
going through the UVM pager "put" vector.
 1.45 06-Nov-2001  simonb Remove some variables that are set but never used.
 1.44 30-Oct-2001  lukem add __KERNEL_RCSID()
 1.43 26-Oct-2001  lukem remove #include <ufs/ufs/quota.h> where it was just to appease
<ufs/ufs/inode.h>, since the latter now includes the former. leave the former
in source that obviously uses specific bits of it (for completeness.)
 1.42 26-Sep-2001  chs branches: 1.42.2;
undo the part of the previous revision about skipping
the put if there are no pages, that seems to cause some problem.
fix another problem with missing an splx(), spotted by enami.
 1.41 26-Sep-2001  chs be sure to call the pager put with page-aligned offsets.
spotted by Nathan Williams.

while I'm here, move an splbio() so that we don't return without
splx()ing it if there's an error, and don't bother calling the
pager put if the vnode has no pages.
 1.40 22-Sep-2001  sommerfeld Add fifo_putpages() placebo so that the vnode's uobj is unlocked.
 1.39 15-Sep-2001  chs a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.38 17-Aug-2001  chs branches: 1.38.2;
add getpages/putpages entries for spec vnodes.
 1.37 22-Jan-2001  jdolecek branches: 1.37.2; 1.37.6;
make filesystem vnodeop, specop, fifoop and vnodeopv_* arrays const
 1.36 10-Dec-2000  chs call pgo_flush with (start,end) rather than (start,length).
 1.35 27-Nov-2000  chs Initial integration of the Unified Buffer Cache project.
 1.34 24-Oct-2000  fvdl Stay at splbio across the VBWAIT loop, as is done elsewhere in the
kernel. Avoids a possible race condition. Pointed out by
enami@netbsd.org, problem reported by deberg@netbsd.org.
 1.33 19-Sep-2000  fvdl Adapt for VOP_FSYNC parameter change.

Implement range fsync for FFS. Note: not yet implemented for the
SOFTDEP case.
 1.32 28-Jun-2000  mrg remove include of <vm/vm.h> and <uvm/uvm_extern.h>
 1.31 29-May-2000  mycroft branches: 1.31.2;
According to Frank, buffers with dependencies *are* left on v_dirtyblks, so
remove the FSYNC_RECLAIM check and force them to be flushed.
 1.30 29-May-2000  mycroft Never call softdep_sync_metadata() in the FSYNC_RECLAIM case. Any pending
blocks are detached from the vnode at this point. When the dependencies are
broken to enable writing the blocks, the vnode will be regenerated. (The only
reason we sync buffers in this case is that they have to be detached from the
vnode.)
 1.29 29-May-2000  mycroft In ffs_fsync(), remove the FSYNC_RECLAIM special case, so that it properly
waits for pending buffers, and doesn't throw away time stamp updates.
 1.28 27-May-2000  thorpej branches: 1.28.2;
sleep() -> tsleep()
 1.27 13-May-2000  perseant Change the sementics of the last parameter from a boolean ("waitfor") to
a set of flags ("flags"). Two flags are defined, UPDATE_WAIT and
UPDATE_DIROP.

Under the old semantics, VOP_UPDATE would block if waitfor were set,
under the assumption that directory operations should be done
synchronously. At least LFS and FFS+softdep do not make this
assumption; FFS+softdep got around the problem by enclosing all relevant
calls to VOP_UPDATE in a "if(!DOINGSOFTDEP(vp))", while LFS simply
ignored waitfor, one of the reasons why NFS-serving an LFS filesystem
did not work properly.

Under the new semantics, the UPDATE_DIROP flag is a hint to the
fs-specific update routine that the call comes from a dirop routine, and
should be wait for, or not, accordingly.

Closes PR#8996.
 1.26 30-Mar-2000  augustss Remove register declarations.
 1.25 29-Mar-2000  simonb Don't need to include <sys/conf.h> here.
 1.24 17-Mar-2000  fvdl If we're reclaiming, and there are no dirty blocks, just return.
 1.23 15-Mar-2000  fvdl Revert this back to 2 revisions ago, these checks are done higher up now.
 1.22 14-Mar-2000  fvdl Don't immediately return in ffs_fsync if there appears to be no data
to flush if it's a vnode on a softdep filesystem. softdep_sync_metadata
may still need to do some work.
 1.21 11-Mar-2000  perseant Move vinvalbuf's check for dirty blocks into ffs_fsync, to ensure that
mode and ownership bits are flushed to disk before the vnode is
reclaimed.

The check, introduced in the softdep merge, assumes that if no blocks
are dirty, no file data *or metadata* needs to be flushed to disk. This
is true of ffs, but is not true of lfs, and may not be true of other
filesystems.

Tested by myself and Bill Squier <groo@cs.stevens-tech.edu>.
 1.20 15-Nov-1999  fvdl Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.19 03-Aug-1999  wrstuden branches: 1.19.2; 1.19.4; 1.19.8;
Add support for fcntl(2) to generate VOP_FCNTL calls. Any fcntl
call with F_FSCTL set and F_SETFL calls generate calls to a new
fileop fo_fcntl. Add genfs_fcntl() and soo_fcntl() which return 0
for F_SETFL and EOPNOTSUPP otherwise. Have all leaf filesystems
use genfs_fcntl().

Reviewed by: thorpej
Tested by: wrstuden
 1.18 24-Mar-1999  mrg branches: 1.18.4;
completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.17 04-Dec-1998  bouyer No need to #include malloc.h here.
 1.16 01-Sep-1998  thorpej branches: 1.16.2;
Use the pool allocator and the "nointr" pool page allocator for FFS inodes.

XXX MFS also comes in here for inodes, and used a different malloc type,
but the structure is the same, so we just use the FFS inode pool.
 1.15 24-Jun-1998  sommerfe Always include fifos; "not an option any more".
 1.14 22-Jun-1998  sommerfe defopt for options FIFO
 1.13 09-Jun-1998  scottr Protect various config(8)-generated files from inclusion while
building LKMs. Fixes PR 5557.
 1.12 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.11 10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.10 05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.9 07-Sep-1996  mycroft Implement poll(2).
 1.8 01-Sep-1996  mycroft Add a set of generic file system operations that most file systems use.
Also, fix some time stamp bogosities.
 1.7 11-May-1996  mycroft Change VOP_UPDATE() semantics:
* Make 2nd and 3rd args timespecs, not timevals.
* Consistently pass a Boolean as the 4th arg (except in LFS).
Also, fix ffs_update() and lfs_update() to actually change the nsec fields.
 1.6 09-Feb-1996  christos ffs prototypes
 1.5 14-Dec-1994  mycroft Sync with CSRG.
 1.4 13-Dec-1994  mycroft Turn lease_check() into a vnode op, per CSRG.
 1.3 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.2 22-Jun-1994  mycroft Deallocate the vnode data using the correct type for MFS nodes.
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.16.2.6 02-Jun-1999  chs use the new flags PG_RDONLY and UFP_NORDONLY to ensure that
any page which becomes dirty will have backing store allocated.
 1.16.2.5 30-May-1999  chs redo ffs_getpages() and ffs_putpages() again since vm_page's
blkno field is gone.
 1.16.2.4 29-Apr-1999  chs disable buffer-cache clustering.
 1.16.2.3 25-Feb-1999  chs major overhaul of getpages and putpages functions.
 1.16.2.2 16-Nov-1998  chs fix style nits.
 1.16.2.1 09-Nov-1998  chs initial snapshot. lots left to do.
 1.18.4.2 04-Jul-1999  chs support VOP_BALLOC(). ffs_getpages() and ffs_putpages() are gone
in favor of the genfs versions.
 1.18.4.1 07-Jun-1999  chs merge everything from chs-ubc branch.
 1.19.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.19.4.2 26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.19.4.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.19.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.19.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.19.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.19.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.28.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.31.2.2 26-Feb-2002  he Pull up revision 1.47 (via patch, requested by fvdl):
Correct a mistake made in the original merge-in of the softdep
code, and fix a problem which caused ffs_fsync to do unneeded
sync writes.
 1.31.2.1 14-Dec-2000  he Pull up revisions 1.33-1.34 (requested by fvdl):
Improve NFS performance, possibly with as much as 100% in
throughput. Please note: this implies a kernel interface change,
VOP_FSYNC gains two arguments.
 1.37.6.7 25-Sep-2002  jdolecek switch over to genfs_kqfilter(), g/c the ufs_kqfilter() code
 1.37.6.6 23-Sep-2002  jdolecek add spec kqfilter vnode op
 1.37.6.5 22-Sep-2002  jdolecek add fifo_kqfilter() to ffs_fifoop_entries[], to switch on
support for kevents on fifos on FFS
 1.37.6.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.37.6.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.37.6.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.37.6.1 10-Jul-2001  lukem add ufs_kqfilter method for vop_kqfilter
 1.37.2.9 11-Nov-2002  nathanw Catch up to -current
 1.37.2.8 16-Jul-2002  nathanw pagedaemon_proc really should be a proc, not a LWP.
 1.37.2.7 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.37.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.37.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.37.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.37.2.3 26-Sep-2001  nathanw Catch up to -current.
Again.
 1.37.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.37.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.38.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.42.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.59.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.59.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.59.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.59.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.59.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.59.2.2 03-Aug-2004  skrll Sync with HEAD
 1.59.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.66.10.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.66.10.1 12-Feb-2005  yamt sync with head.
 1.66.8.1 29-Apr-2005  kent sync with -current
 1.69.4.8 04-Feb-2008  yamt sync with head.
 1.69.4.7 21-Jan-2008  yamt sync with head
 1.69.4.6 07-Dec-2007  yamt sync with head
 1.69.4.5 27-Oct-2007  yamt sync with head.
 1.69.4.4 03-Sep-2007  yamt sync with head.
 1.69.4.3 26-Feb-2007  yamt sync with head.
 1.69.4.2 30-Dec-2006  yamt sync with head.
 1.69.4.1 21-Jun-2006  yamt sync with head.
 1.69.2.2 21-Oct-2005  tron Pull up following revision(s) (requested by yamt in ticket #845):
sys/ufs/ffs/ffs_extern.h: revision 1.45 via patch
sys/ufs/ffs/ffs_vnops.c: revision 1.75 via patch
revert the code to expand putpage requests to block boundary.
because:
- it was incomplete in some cases.
- it can confuse pagedaemon.
see PR/15364 for details.
 1.69.2.1 24-Aug-2005  riz Pull up following revision(s) (requested by yamt in ticket #688):
sys/miscfs/genfs/genfs_vnops.c: revision 1.98 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.165
sys/ufs/lfs/lfs_extern.h: revision 1.69
sys/fs/filecorefs/filecore_vfsops.c: revision 1.20
sys/nfs/nfs_node.c: revision 1.80
sys/fs/smbfs/smbfs_node.c: revision 1.24
sys/fs/cd9660/cd9660_vfsops.c: revision 1.24
sys/fs/msdosfs/msdosfs_denode.c: revision 1.8
sys/miscfs/genfs/genfs_node.h: revision 1.6
sys/ufs/lfs/lfs_vfsops.c: revision 1.183
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.86
sys/fs/adosfs/advfsops.c: revision 1.23
sys/fs/ntfs/ntfs_vfsops.c: revision 1.31
- constify genfs_ops.
- use member designators.

sys/miscfs/genfs/genfs_vnops.c: revision 1.99 via patch
genfs_getpages: don't forget to put the vnode onto the syncer's work que
ue
even in the case of PGO_LOCKED.

sys/uvm/uvm_bio.c: revision 1.40
sys/uvm/uvm_pager.h: revision 1.29
sys/miscfs/genfs/genfs_vnops.c: revision 1.100 via patch
sys/ufs/ufs/ufs_inode.c: revision 1.50
- introduce PGO_NOBLOCKALLOC and use it for ubc mapping
to prevent unnecessary block allocations in the case that
page size > block size.
- ufs_balloc_range: use VM_PROT_WRITE+PGO_NOBLOCKALLOC rather than
VM_PROT_READ.

sys/uvm/uvm_fault.c: revision 1.96
sys/miscfs/genfs/genfs_vnops.c: revision 1.101 via patch
sys/uvm/uvm_object.h: revision 1.19
sys/miscfs/genfs/genfs_node.h: revision 1.7
ensure that vnodes with dirty pages are always on syncer's queue.
- genfs_putpages: wait for i/o completion of PG_RELEASED/PG_PAGEOUT pages by
setting "wasclean" false when encountering them.
suggested by Stephan Uphoff in PR/24596 (1).
- genfs_putpages: write protect pages when cleaning out, if
we're going to take the vnode off the syncer's queue.
uvm_fault: don't write-map pages unless its vnode is already on
the syncer's queue.
fix PR/24596 (3) but in the different way from the suggested fix.
(to keep our current behaviour, ie. not to require explicit msync.
discussed on tech-kern@.)
- genfs_putpages: don't mistakenly take a vnode off the queue
by introducing a generation number in genfs_node.
genfs_getpages: increment the generation number.
suggested by Stephan Uphoff in PR/24596 (2).
- add some assertions.

sys/miscfs/genfs/genfs_vnops.c: revision 1.102 via patch
genfs_putpages: don't bother to clean the vnode unless VONWORKLST.

sys/ufs/ffs/ffs_vnops.c: revision 1.71
ffs_full_fsync: because VBLK/VCHR can be mmap'ed,
do VOP_PUTPAGES for them as well.

sys/uvm/uvm_fault.c: revision 1.97
uvm_fault: check a correct object in the case of layered filesystems.
fix PR/30811 from Jukka Salmi.

sys/uvm/uvm_object.h: revision 1.20
sys/ufs/ffs/ffs_vfsops.c: revision 1.167
sys/uvm/uvm_bio.c: revision 1.41
sys/ufs/ufs/ufs_vnops.c: revision 1.129
sys/uvm/uvm_mmap.c: revision 1.92
sys/uvm/uvm_fault.c: revision 1.98
sys/kern/vfs_subr.c: revision 1.252
sys/fs/msdosfs/denode.h: revision 1.5
sys/miscfs/genfs/genfs_vnops.c: revision 1.103 via patch
sys/fs/msdosfs/msdosfs_denode.c: revision 1.9
sys/sys/vnode.h: revision 1.141
sys/ufs/ufs/ufs_inode.c: revision 1.51
sys/ufs/ufs/ufs_extern.h: revision 1.45 via patch
sys/miscfs/genfs/genfs_node.h: revision 1.8
sys/ufs/lfs/lfs_vfsops.c: revision 1.184
sys/uvm/uvm_pager.h: revision 1.30
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.87
update file timestamps for nfsd loaned-read and mmap.
PR/25279. discussed on tech-kern@.

sys/miscfs/genfs/genfs_vnops.c: revision 1.104 via patch
don't write-protect wired pages. pointed by Chuck Silvers.
for now, leave a vnode on the syncer's queue, as suggested by him.

sys/ufs/ffs/ffs_vnops.c: revision 1.72
revert VCHR part of ffs_vnops.c 1.71.
as VCHR uses the device pager, no point to call VOP_PUTPAGES here.
pointed by Chuck Silvers.
 1.75.2.2 29-Oct-2005  yamt use ffs_* directly rather than via ufs_ops.
suggested by Chuck Silvers.
 1.75.2.1 20-Oct-2005  yamt adapt ufs.
 1.77.12.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.77.12.1 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.77.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.77.10.2 19-Apr-2006  elad sync with head.
 1.77.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.77.8.4 11-Aug-2006  yamt sync with head
 1.77.8.3 24-May-2006  yamt sync with head.
 1.77.8.2 11-Apr-2006  yamt sync with head
 1.77.8.1 01-Apr-2006  yamt sync with head.
 1.77.6.2 01-Jun-2006  kardel Sync with head.
 1.77.6.1 22-Apr-2006  simonb Sync with head.
 1.77.4.1 09-Sep-2006  rpaulo sync with head
 1.81.6.2 10-Dec-2006  yamt sync with head.
 1.81.6.1 22-Oct-2006  yamt sync with head
 1.81.4.2 01-Feb-2007  ad Sync with head.
 1.81.4.1 18-Nov-2006  ad Sync with head.
 1.85.2.2 17-May-2007  yamt sync with head.
 1.85.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.86.6.1 11-Jul-2007  mjf Sync with head.
 1.86.4.13 09-Oct-2007  ad Sync with head.
 1.86.4.12 16-Sep-2007  ad - Checkpoint work in progress on the vnode lifecycle and reference counting
stuff. This makes it work properly without kernel_lock and fixes a few
quite old bugs. See vfs_subr.c 1.283.2.17 for details.

- Fix some problems with softdep. Unfortunately our softdep code appears
to have some longstanding bugs that cause it fail under stress test.
 1.86.4.11 30-Aug-2007  ad bufcache_lock is sufficient to inspect v_dirtyblkhd, vp->v_interlock is only
needed to modify.
 1.86.4.10 24-Aug-2007  ad Sync with buffer cache locking changes. See buf.h/vfs_bio.c for details.
Some minor portions are incomplete and needs to be verified as a whole.
 1.86.4.9 20-Aug-2007  ad Sync with HEAD.
 1.86.4.8 01-Jul-2007  ad Minor locking fixes.
 1.86.4.7 23-Jun-2007  ad - Lock v_cleanblkhd, v_dirtyblkhd, v_numoutput with the vnode's interlock.
Get rid of global_v_numoutput_lock. Partially incomplete as the buffer
cache locking doesn't work very well and needs an overhaul.
- Some changes to try and make softdep MP safe. Untested.
 1.86.4.6 09-Jun-2007  ad Sync with head.
 1.86.4.5 08-Jun-2007  ad Sync with head.
 1.86.4.4 27-May-2007  ad ffs_sync: vp->v_data can be NULL if the vnode is being recycled.
 1.86.4.3 13-May-2007  ad - Pass the error number and residual count to biodone(), and let it handle
setting error indicators. Prepare to eliminate B_ERROR.
- Add a flag argument to brelse() to be set into the buf's flags, instead
of doing it directly. Typically used to set B_INVAL.
- Add a "struct cpu_info *" argument to kthread_create(), to be used to
create bound threads. Change "bool mpsafe" to "int flags".
- Allow exit of LWPs in the IDL state when (l != curlwp).
- More locking fixes & conversion to the new API.
 1.86.4.2 21-Mar-2007  ad - Replace more simple_locks, and fix up in a few places.
- Use condition variables.
- LOCK_ASSERT -> KASSERT.
 1.86.4.1 13-Mar-2007  ad Pull in the initial set of changes for the vmlocking branch.
 1.88.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.88.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.89.6.2 20-Jul-2007  pooka In sync, skip over vnodes based on if they are clean rather than
if they have pages.
 1.89.6.1 20-Jul-2007  pooka file ffs_vnops.c was added on branch matt-mips64 on 2007-07-20 16:46:46 +0000
 1.89.4.4 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.89.4.3 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.89.4.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.89.4.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.91.4.1 14-Oct-2007  yamt sync with head.
 1.91.2.3 23-Mar-2008  matt sync with HEAD
 1.91.2.2 09-Jan-2008  matt sync with HEAD
 1.91.2.1 06-Nov-2007  matt sync with HEAD
 1.92.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.92.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.93.6.3 10-Jan-2008  bouyer Sync with HEAD
 1.93.6.2 08-Jan-2008  bouyer Sync with HEAD
 1.93.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.93.2.4 30-Dec-2007  ad Fix remaining problems with ext2fs on this branch.
 1.93.2.3 10-Dec-2007  ad - Don't drain the vnode lock in vclean(); reference counting and XLOCK
should be enough.
- LK_SETRECURSE is gone.
 1.93.2.2 09-Dec-2007  ad LK_SETRECURSE is unused.
 1.93.2.1 04-Dec-2007  ad Pull the vmlocking changes into a new branch.
 1.98.10.5 09-Oct-2010  yamt sync with head
 1.98.10.4 11-Aug-2010  yamt sync with head.
 1.98.10.3 11-Mar-2010  yamt sync with head
 1.98.10.2 04-May-2009  yamt sync with head.
 1.98.10.1 16-May-2008  yamt sync with head.
 1.98.8.1 18-May-2008  yamt sync with head.
 1.98.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.98.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.98.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.99.6.1 19-Oct-2008  haad Sync with HEAD.
 1.99.4.3 18-Jul-2008  simonb In ffs_fsync() pass FSYNC_VFS to ffs_full_fsync() for a VBLK vnode so
that the correct "struct mount" is referenced.

Fixes WAPBL for the "mount update" case, so remove the "anti-kern/38057"
hack that was previous there to guard against this.

Based on suggestion from yamt@. yamt suggest this could be cleaner
that the current VFS_FSYNC method too. Another day...
 1.99.4.2 12-Jun-2008  martin License police
 1.99.4.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.99.2.2 10-Oct-2008  skrll Sync with HEAD.
 1.99.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.104.4.8 17-Jul-2011  riz Pull up following revision(s) (requested by manu in ticket #1645):
lib/libc/sys/Makefile.inc 1.207 via patch
lib/libc/sys/extattr_get_file.2 patch
lib/libpuffs/dispatcher.c 1.34,1.36 via patch
lib/libpuffs/puffs.c 1.107 via patch
lib/libpuffs/puffs.h 1.115,1.118 via patch
sys/fs/puffs/puffs_msgif.h 1.71,1.76 via patch
sys/fs/puffs/puffs_vfsops.c 1.88 via patch
sys/fs/puffs/puffs_vnops.c 1.145,1.154 via patch
sys/kern/vfs_xattr.c 1.24-1.27 via patch
sys/kern/vnode_if.c 1.87 via patch
sys/sys/Makefile 1.133 via patch
sys/sys/extattr.h 1.6 via patch
sys/sys/vnode_if.h 1.81 via patch
sys/ufs/ffs/ffs_vnops.c patch
sys/ufs/ufs/ufs_extattr.c 1.31,1.34 via patch

* support extended attributes
* bump major due to structure growth
* add some spare space
* remove ABI sillyness
Support extended attributes.
Fix multiple non compliances in our Linux-like extattr API, and make it
public so that it can be used.
Improve a bit listxattr(2). It attemps to list both system and user
extended attributes, and it faled if calling user did not have privilege
for reading system EA. Now we just lise user EA and skip system EA in
reading them is not allowed.
Fix bug introduced in previous commuit: Do not vrele() a vnode we did not
obtained.
Improve UFS1 extended attributes usability
- autocreate attribute backing file for new attributes
- autoload attributes when issuing extattrctl start
- when autoloading attributes, do not display garbage warning when looking
up entries that got ENOENT
Add a flag to VOP_LISTEXTATTR(9) so that the vnode interface can tell the
filesystem in which format extended attribute shall be listed.
There are currently two formats:
- NUL-terminated strings, used for listxattr(2), this is the default.
- one byte length-pprefixed, non NUL-terminated strings, used for
extattr_list_file(2), which is obtanined by setting the
EXTATTR_LIST_PREFIXLEN flag to VOP_LISTEXTATTR(9)
This approach avoid the need for converting the list back and forth, except
in libperfuse, since FUSE uses NUL-terminated strings, and the kernel may
have requested EXTATTR_LIST_PREFIXLEN.
 1.104.4.7 04-Apr-2009  snj Pull up following revision(s) (requested by add in ticket #655):
sys/ufs/ffs/ffs_vfsops.c: revision 1.245 via patch
sys/ufs/ffs/ffs_vnops.c: revision 1.112 via patch
fsync:
- atime updates were not being synced.
ffs_sync:
- In some cases the sync vnode was acting like now dead /usr/sbin/update.
It was examining vnodes that it should have ignored.
- It would find dirty inodes and try to flush them. Often ffs_fsync()
cheerfully ignored the flush request due to the fsync bug. Such inodes
remained dirty and were repeatedly re-examined by the syncer until
vnode reclaim or system shutdown.
- We were marking our place in the per-mount vnode list even though in
most cases there was not flush to perform. While not a bug, this wasted
CPU cycles because a TAILQ_NEXT would have sufficed.
 1.104.4.6 24-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #490):
sys/kern/vfs_wapbl.c: revision 1.23
sys/miscfs/syncfs/sync_subr.c: revision 1.36
sys/miscfs/syncfs/sync_vnops.c: revision 1.26
sys/ufs/ffs/ffs_alloc.c: revision 1.121
sys/ufs/ffs/ffs_vfsops.c: revision 1.242
sys/ufs/ffs/ffs_vnops.c: revision 1.110
PR kern/39564 wapbl performance issues with disk cache flushing
PR kern/40361 WAPBL locking panic in -current
PR kern/40361 WAPBL locking panic in -current
PR kern/40470 WAPBL corrupts ext2fs
PR kern/40562 busy loop in ffs_sync when unmounting a file system
PR kern/40525 panic: ffs_valloc: dup alloc
- A fix for an issue that can lead to "ffs_valloc: dup" due to dirty cg
buffers being invalidated. Problem discovered and patch by dholland@.
- If the syncer fails to lazily sync a vnode due to lock contention,
retry 1 second later instead of 30 seconds later.
- Flush inode atime updates every ~10 seconds (this makes most sense with
logging). Presently they didn't hit the disk for read-only files or
devices until the file system was unmounted. It would be better to trickle
the updates out but that would require more extensive changes.
- Fix issues with file system corruption, busy looping and other nasty
problems when logging and non-logging file systems are intermixed,
with one being the root file system.
- For logging, do not flush metadata on an inode-at-a-time basis if the sync
has been requested by ioflush. Previously, we could try hundreds of log
sync operations a second due to inode update activity, causing the syncer
to fall behind and metadata updates to be serialized across the entire
file system. Instead, burst out metadata and log flushes at a minimum
interval of every 10 seconds on an active file system (happens more often
if the log becomes full). Note this does not change the operation of
fsync() etc.
- With the flush issue fixed, re-enable concurrent metadata updates in
vfs_wapbl.c.
 1.104.4.5 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #395):
sys/ufs/ffs/ffs_vnops.c: revision 1.109
PR kern/40469 5.0_BETA/amd64 INSTALL kernel panics when installing on
log-enabled filesystems
PR kern/40470 WAPBL corrupts ext2fs
Don't touch inodes at all unless VOP_FSYNC(). Might fix the ext2fs problem,
I am not sure.
 1.104.4.4 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #395):
sys/ufs/ffs/ffs_vnops.c: revision 1.108
Don't try to ffs_update VT_NON vnodes
 1.104.4.3 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #395):
sys/ufs/ffs/ffs_vnops.c: revision 1.107
Add a comment.
 1.104.4.2 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #395):
sys/ufs/ffs/ffs_vnops.c: revision 1.106
PR kern/40246 current panics when removing swap devices
Someone was smoking crack when they decided to unconditionally OR FSYNC_VFS
into the flags for block devices.
 1.104.4.1 02-Feb-2009  snj Pull up following revision(s) (requested by ad in ticket #395):
sys/ufs/ffs/ffs_vnops.c: revision 1.105
PR kern/40210 5.0 BETA WAPBL related crash
 1.104.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.104.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.104.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.109.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.113.4.5 31-May-2011  rmind sync with head
 1.113.4.4 21-Apr-2011  rmind sync with head
 1.113.4.3 05-Mar-2011  rmind sync with head
 1.113.4.2 30-May-2010  rmind sync with head
 1.113.4.1 16-Mar-2010  rmind Change struct uvm_object::vmobjlock to be dynamically allocated with
mutex_obj_alloc(). It allows us to share the locks among UVM objects.
 1.113.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.113.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.116.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.118.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.120.8.1 07-May-2012  riz Pull up following revision(s) (requested by chs in ticket #204):
sys/fs/sysvbfs/sysvbfs_vnops.c: revision 1.44
sys/ufs/ffs/ffs_vfsops.c: revision 1.277
sys/fs/v7fs/v7fs_vnops.c: revision 1.11
sys/ufs/chfs/chfs_vnops.c: revision 1.7
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.61
sys/miscfs/genfs/genfs_io.c: revision 1.54
sys/kern/vfs_wapbl.c: revision 1.52
sys/uvm/uvm_pager.h: revision 1.43
sys/ufs/ffs/ffs_vnops.c: revision 1.121
sys/kern/vfs_subr.c: revision 1.434
sys/fs/msdosfs/msdosfs_vnops.c: revision 1.83
sys/fs/ntfs/ntfs_vnops.c: revision 1.51
sys/fs/udf/udf_subr.c: revision 1.119
sys/miscfs/specfs/spec_vnops.c: revision 1.135
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.103
sys/fs/udf/udf_vnops.c: revision 1.71
sys/ufs/ufs/ufs_readwrite.c: revision 1.104
change vflushbuf() to take the full FSYNC_* flags.
translate FSYNC_LAZY into PGO_LAZY for VOP_PUTPAGES() so that
genfs_do_io() can set the appropriate io priority for the I/O.
this is the first part of addressing PR 46325.
mark all wapbl I/O as BPRIO_TIMECRITICAL.
this is the second part of addressing PR 46325.
 1.120.6.1 02-Jun-2012  mrg sync to latest -current.
 1.120.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.120.2.2 23-Jan-2013  yamt sync with head
 1.120.2.1 23-May-2012  yamt sync with head.
 1.121.2.3 03-Dec-2017  jdolecek update from HEAD
 1.121.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.121.2.1 25-Feb-2013  tls resync with head
 1.123.2.1 18-May-2014  rmind sync with head
 1.124.2.1 10-Aug-2014  tls Rebase.
 1.125.12.1 21-Apr-2017  bouyer Sync with HEAD
 1.125.8.1 20-Mar-2017  pgoyette Sync with HEAD
 1.125.4.1 28-Aug-2017  skrll Sync with HEAD
 1.129.16.1 29-Feb-2020  ad Sync with head.
 1.129.10.2 21-Apr-2020  martin Sync with HEAD
 1.129.10.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.130.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.133.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.50 30-Dec-2024  hannken Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.49 13-May-2024  msaitoh branches: 1.49.2;
s/contigous/contiguous/ in comment.
 1.48 22-May-2022  andvar branches: 1.48.4;
fix various small typos, mainly in comments.
 1.47 13-May-2022  reinoud Fix typo dallocate -> deallocate
 1.46 11-Apr-2020  jdolecek remove noncompilable WAPBL_DEBUG_INODES

PR kern/49554 by Thomas Klausner
 1.45 17-Jan-2020  ad branches: 1.45.4;
VFS_VGET(), VFS_ROOT(), VFS_FHTOVP(): give them a "int lktype" argument, to
allow us to get shared locks (or no lock) on the returned vnode. Matches
FreeBSD.
 1.44 01-Jan-2019  hannken branches: 1.44.4; 1.44.6;
Add "void *extra" argument to vcache_new() so a file system may
pass more information about the file to create.

Welcome to 8.99.30
 1.43 10-Dec-2018  jdolecek make UFS_WAPBL_JLOCK_ASSERT() #ifdef DIAGNOSTIC, same as the underlying
function KASSERT(), so that it actually does something; fix code using
it to actually pass correct params, so that it compiles

remove UFS_WAPBL_JUNLOCK_ASSERT(), as that is inherently racy (it's
okay on those places if the rwlock is held by other lwp); depend
on the RW_ASSERT()/LOCKDEBUG inside rw_enter() to catch the case
with wapbl rwlock held by current lwp
 1.42 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.41 28-May-2017  hannken branches: 1.41.8; 1.41.10;
No need to call vgone() on the just created in file system log vnode,
vput() is sufficient.
 1.40 22-Mar-2017  jdolecek move the ffs_sync() after wapbl_log_position() call, since that can still
create delayed writes with MNT_ASYNC when log is created
 1.39 16-Mar-2017  jdolecek need to turn off async during ffs_sync(), otherwise its bwrite() calls are
themselves turned to bdwrite(), creating dirty delayed writes

fixes panic for 'mount -o log,async ...' reported by Masanobu SAITOH
on current-users; fix help by hannken@, thank you
 1.38 10-Mar-2017  jdolecek sync any delayed writes when updating filesystem to log

Adresses PR kern/52056 by Martin Husemann, fix helped by Juergen Hannken, thanks
 1.37 10-Nov-2016  jdolecek branches: 1.37.2;
disable discard when log is enabled to preserve log consistency promise

PR kern/50725
 1.36 10-Nov-2016  jdolecek during truncate with wapbl, register deallocation for upper indirect block
before recursing into lower blocks, to make sure that it will be removed after
all its referenced blocks are removed

fixes 'ffs_blkfree_common: freeing free block' panic triggered by
ufs_truncate_retry() when just the upper indirect block registration failed,
code tried to free the lower blocks again after wapbl flush

problem found by hannken@, thank you
 1.35 02-Oct-2016  christos use __func__ and print the filesystem we are printing the message for.
 1.34 01-Oct-2016  jdolecek allocate wapbl dealloc registration structures via pool, so that there is more
flexibility with limit handling
 1.33 01-Oct-2016  jdolecek wapbl_remove_log(): add missing break; harmless, fallthrough just printed
extra debug message
 1.32 24-Sep-2016  jdolecek fix swapped KASSERT()
 1.31 24-Sep-2016  jdolecek i/o optimization for wapbl flush - only sync superblock and cgs when
they were actually changed
 1.30 28-Mar-2015  maxv branches: 1.30.2;
Remove the 'cred' argument from bread(). Remove a now unused var in
ffs_snapshot.c. Update the man page accordingly.

ok hannken@
 1.29 17-Mar-2015  hannken Change ffs to use vcache_new:
- Change ffs_valloc to return an inode number.
- Remove now obsolete UFS operations UFS_VALLOC and UFS_VFREE.
- Make ufs_makeinode private to ufs_vnops.c and pass vattr instead of mode.
 1.28 11-Jul-2014  christos branches: 1.28.4;
move the flag setting higher to avoid KASSERT (dholland)
 1.27 10-Jul-2014  christos CID 975226: hande error from UFS_WAPBL_BEGIN
 1.26 10-Jul-2014  dholland Fix unchecked UFS_WAPBL_BEGIN. Coverity 975226.
Unfortunately it looks like all we can do on error here is printf.
 1.25 25-Oct-2013  martin branches: 1.25.2;
Turn a few __unused into __diagused
 1.24 20-Oct-2013  htodd Definining needswap where needed.
 1.23 19-Oct-2013  martin Mark a potentially unused variable
 1.22 23-Jun-2013  dholland branches: 1.22.2;
Stick ffs_ in front of the following macros:
fragstoblks()
blkstofrags()
fragnum()
blknum()

to finish the job of distinguishing them from the lfs versions, which
Christos renamed the other day.

I believe this is the last of the overtly ambiguous exported symbols
from ffs... or at least, the last of the ones that conflicted with lfs.
ffs still pollutes the C namespace very broadly (as does ufs) and this
needs quite a bit more cleanup.

XXX: boo on macros with lowercase names. But I'm not tackling that just yet.
 1.21 23-Jun-2013  dholland Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.20 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.19 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.18 20-Dec-2012  hannken Change bread() and breadn() to never return a buffer on
error and modify all callers to not brelse() on error.

Welcome to 6.99.16

PR kern/46282 (6.0_BETA crash: msdosfs_bmap -> pcbmap -> bread -> bio_doread)
 1.17 24-Dec-2010  mlelstv branches: 1.17.8; 1.17.18;
For update mounts the root vnode is already in use and we must not
free it. Since the mount persists even when the update fails,
this is not a problem either.
 1.16 23-Dec-2010  mlelstv mount(2) doesn't remove vnodes from the freelist in the error path,
so that they get reused with a invalid pointer to a mount structure.

As a workaround, free the vnodes used to create the in-filesystem journal
immediately.
 1.15 27-Feb-2010  mlelstv branches: 1.15.2;
Store physical block numbers in superblock that point to the journal.
Calculate position of both commit headers correctly for disks with
large sectors.
Correct calculation of circular buffer size.
 1.14 23-Feb-2010  mlelstv Replace individual queries for partition information with
new helper function.
Use this information to query physical sector sizes for WAPBL
instead of hardcoded defaults.
No longer limits physical sector sizes to 512 bytes.
 1.13 13-Sep-2009  bouyer branches: 1.13.2;
Allow tunefs to clear any type of WAPBL log, not only in-filesystem
ones. Discussed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
 1.12 22-Feb-2009  ad branches: 1.12.2;
PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.11 31-Jan-2009  yamt branches: 1.11.2;
0 -> NULL
 1.10 31-Jan-2009  yamt wapbl_log_position: 1 -> MNT_WAIT
 1.9 30-Nov-2008  joerg Split ffs_blkalloc into a frontend that does inode based consistency
checks and a backend that just asserts them. Use the backend in
ffs_wapbl_abort_sync_metadata instead of faking an inode.
 1.8 11-Nov-2008  joerg wapbl_replay_free needs the reply to have been stopped, so make sure
that the changes happen in the right order. Reported by veego@
 1.7 10-Nov-2008  joerg Reduce internals of WAPBL exposed to the rest of the system.
 1.6 08-Sep-2008  joerg branches: 1.6.2; 1.6.4; 1.6.6; 1.6.8; 1.6.12;
Move successful removal of unreferenced inodes under WAPBL_DEBUG to not
spam the console.

OK simon@
 1.5 05-Aug-2008  pooka zu, not zd, to print size_t
 1.4 04-Aug-2008  simonb Only allow WAPBL to operate with UFS2 style superblocks.

Problem reported by Takeshi Nakayama.
 1.3 02-Aug-2008  simonb When checking if there's enough space at the end of a partition,
compare bytes vs bytes, not sectors vs bytes.

Problem discovered and fix tested by Michael Hitch.
 1.2 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.1 10-Jun-2008  simonb branches: 1.1.2; 1.1.4;
file ffs_wapbl.c was initially added on branch simonb-wapbl.
 1.1.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.1.4.1 19-Oct-2008  haad Sync with HEAD.
 1.1.2.4 28-Jul-2008  simonb Add support for creating a WAPBL log in the filesystem. Will
create an in-filesystem log on first "mount -o log" if one doesn't
exist, and will then continue to use same log in the future. See
(soon to be added) wapbl(4) for more info.

Adds a new B_CONTIG low-level allocation flag that uses hints in
"struct ffs_inode_ext" to lay out an ffs file's data contiguously.

Thanks to Greg Oster for helping with the design of this and to
Antti Kantee for code review and suggestions.
 1.1.2.3 03-Jul-2008  simonb Store the location of the journal in the superblock. Currently
nothing really uses this, other than replay checking that what is
in the superblock matches what it expects.
 1.1.2.2 12-Jun-2008  martin License police
 1.1.2.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.6.12.1 21-Apr-2010  matt sync to netbsd-5
 1.6.8.1 03-Oct-2009  snj Pull up following revision(s) (requested by bouyer in ticket #1036):
sbin/fsck_ffs/extern.h: revision 1.25 via patch
sbin/fsck_ffs/setup.c: revision 1.88 via patch
sbin/fsck_ffs/wapbl.c: revision 1.4 via patch
sbin/tunefs/tunefs.c: revision 1.41 via patch
sys/ufs/ffs/ffs_vfsops.c: revision 1.252 via patch
sys/ufs/ffs/ffs_wapbl.c: revision 1.13 via patch
Allow tunefs to clear any type of WAPBL log, not only in-filesystem
ones. Discussed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
--
Do some basic checks of the WAPBL journal, to abort the boot before the
kernel refuse to mount a filesystem read-write (booting a system
multiuser with critical filesystems read-only is bad):
Add a check_wapbl() which will check some WAPBL values in the superblock,
and try to read the journal via wapbl_replay_start() if there is one.
pfatal() if one of these fail (abort boot if in preen mode,
as "CONTINUE" otherwise). In non-preen mode the bogus journal will
be cleared.
check_wapbl() is always called if the superblock supports WAPBL.
Even if FS_DOWAPBL is not there, there could be flags asking the
kernel to clear or create a log with bogus values which would cause the
kernel refuse to mount the filesystem.
Discussed in
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
--
If the WAPBL journal can't be read (ffs_wapbl_replay_start() fails),
mount the filesystem anyway if MNT_FORCE is present.
This allows to still boot single-user a system with a corrupted
WAPBL on /, and so get a chance to run fsck to fix it.
http://mail-index.netbsd.org/tech-kern/2009/08/17/msg005896.html
and followups.
 1.6.6.2 03-Mar-2009  skrll Sync with HEAD.
 1.6.6.1 19-Jan-2009  skrll Sync with HEAD.
 1.6.4.3 17-Jan-2009  mjf Sync with HEAD.
 1.6.4.2 28-Sep-2008  mjf Sync with HEAD.
 1.6.4.1 08-Sep-2008  mjf file ffs_wapbl.c was added on branch mjf-devfs2 on 2008-09-28 10:41:06 +0000
 1.6.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.6.2.1 08-Sep-2008  wrstuden file ffs_wapbl.c was added on branch wrstuden-revivesa on 2008-09-18 04:37:05 +0000
 1.11.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.12.2.4 11-Mar-2010  yamt sync with head
 1.12.2.3 16-Sep-2009  yamt sync with head
 1.12.2.2 04-May-2009  yamt sync with head.
 1.12.2.1 22-Feb-2009  yamt file ffs_wapbl.c was added on branch yamt-nfs-mp on 2009-05-04 08:14:38 +0000
 1.13.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.15.2.1 05-Mar-2011  rmind sync with head
 1.17.18.3 03-Dec-2017  jdolecek update from HEAD
 1.17.18.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.17.18.1 25-Feb-2013  tls resync with head
 1.17.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.17.8.1 23-Jan-2013  yamt sync with head
 1.22.2.1 18-May-2014  rmind sync with head
 1.25.2.1 10-Aug-2014  tls Rebase.
 1.28.4.4 28-Aug-2017  skrll Sync with HEAD
 1.28.4.3 05-Dec-2016  skrll Sync with HEAD
 1.28.4.2 05-Oct-2016  skrll Sync with HEAD
 1.28.4.1 06-Apr-2015  skrll Sync with HEAD
 1.30.2.4 26-Apr-2017  pgoyette Sync with HEAD
 1.30.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.30.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.30.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.37.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.41.10.3 21-Apr-2020  martin Sync with HEAD
 1.41.10.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.41.10.1 10-Jun-2019  christos Sync with HEAD
 1.41.8.3 18-Jan-2019  pgoyette Synch with HEAD
 1.41.8.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.41.8.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.44.6.1 17-Jan-2020  ad Sync with head.
 1.44.4.1 07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1934):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383 (patch)
sys/ufs/ffs/ffs_vfsops.c: revision 1.384 (patch)

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.45.4.1 20-Apr-2020  bouyer Sync with HEAD
 1.48.4.1 07-Jan-2025  martin Pull up following revision(s) (requested by hannken in ticket #1037):

sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.228
sys/ufs/lfs/lfs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_wapbl.c: revision 1.50
sys/ufs/ffs/ffs_vfsops.c: revision 1.383
sys/ufs/ffs/ffs_vfsops.c: revision 1.384

Remove comment "we are always called with the filesystem marked `MPBUSY'."
above some xxx_sync() operations. These operations get called without
any exclusive lock.

This comment appeared with "add quota support" on 1990-05-02.
On 1998/02/18 MNT_MPBUSY disappeared when vfs_busy() was changed from
an exclusive lock to a shared lock.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"

Protect test/clear fs->fs_fmod with um_lock like it is already
protected in ffs_alloc.c.

When writing to disk protect moving superblock to buffer with um_lock.

Set/clear fs->fmod while mounting, updating a mount or unmounting
is safe as these operations run exclusive, either mounting creates
a new file system or the file system is suspended. Assert suspension
for update and unmount.

PR kern/58837 "ffs: Missing locking around fs_fmod/time"
 1.49.2.1 02-Aug-2025  perseant Sync with HEAD
 1.73 13-Dec-2024  riastradh sys/ufs/ffs/fs.h: Fix confusing comment about struct fs.

This is the on-disk format, not a purely in-memory data structure
like struct ufsmount. While the on-disk format happens to be copied
into memory, it is misleading to say `in memory' here.
 1.72 13-May-2024  msaitoh branches: 1.72.2;
s/of of/of/ in comment.
 1.71 07-Jan-2023  chs ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:

commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000

This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.

To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.

Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000

One last pass to get all the unsigned comparisons correct.


In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.70 17-Nov-2022  chs branches: 1.70.2;
Restore backward compatibility of UFS2 with previous NetBSD releases by
disabling support in UFS2 for extended attributes (including ACLs).
Add a new variant of UFS2 called "UFS2ea" that does support extended attributes.
Add new fsck_ffs operations "-c ea" and "-c no-ea" to convert file systems
from UFS2 to UFS2ea and vice-versa (both of which delete all existing extended
attributes in the process).
 1.69 18-Sep-2021  christos Change the default for ACLs to be posix1e instead of nfsv4 to match FreeBSD.
Requested by chuq.
 1.68 16-May-2020  christos Add ACL support for FFS. From FreeBSD.
 1.67 18-Apr-2020  christos Extended attribute support for ffsv2, from FreeBSD.
 1.66 14-Feb-2015  maxv branches: 1.66.18; 1.66.28;
Two typos:
- "preferrably" -> "preferably"
- "overriden" -> "overridden"
No functional change.
 1.65 03-Sep-2013  dholland branches: 1.65.6;
Add the FS_SUJ flag for journaled softupdates from FreeBSD.

This conflicts with our flag for FS_INDEXDIRS. Apparently FreeBSD
changed that arbitrarily on their end when implementing journaled
softupdates, so follow their lead.

Unfortunately, the new value they use for FS_INDEXDIRS conflicts with
our flag FS_DOQUOTA2 for 64-bit quotas. Since the only thing in our
tree that knows about FS_INDEXDIRS is dumpfs (for printing it), leave
FS_INDEXDIRS commented out.

Also add FS_NFS4ACLS from FreeBSD, commented out because it conflicts
with our FS_DOWAPBL, and FS_TRIM.

(We could honor FS_TRIM as we have code for doing that; however I'm
not sure why FreeBSD chose to make it an on-disk flag instead of e.g.
a mount option and it seems problematic to me. In any case, not in
this commit.)

Also see a post I just made in tech-kern about the flag conflicts.
 1.64 23-Jun-2013  dholland branches: 1.64.2;
Stick ffs_ in front of the following macros:
fragstoblks()
blkstofrags()
fragnum()
blknum()

to finish the job of distinguishing them from the lfs versions, which
Christos renamed the other day.

I believe this is the last of the overtly ambiguous exported symbols
from ffs... or at least, the last of the ones that conflicted with lfs.
ffs still pollutes the C namespace very broadly (as does ufs) and this
needs quite a bit more cleanup.

XXX: boo on macros with lowercase names. But I'm not tackling that just yet.
 1.63 23-Jun-2013  dholland Stick ffs_, ext2_, chfs_, filecore_, cd9660_, or mfs_ in front of
the following symbols so as to disambiguate fully. (Christos already
did the lfs ones.)

lblkno
lblktosize
lfragtosize
numfrags
blkroundup
fragroundup
 1.62 23-Jun-2013  dholland fsbtodb() -> FFS_FSBTODB(), EXT2_FSBTODB(), or MFS_FSBTODB()
dbtofsb() -> FFS_DBTOFSB() or EXT2_DBTOFSB()

(Christos already did the lfs ones a few days back)
 1.61 19-Jun-2013  dholland Rename ambiguous macros:
MAXDIRSIZE -> UFS_MAXDIRSIZE or LFS_MAXDIRSIZE
NINDIR -> FFS_NINDIR, EXT2_NINDIR, LFS_NINDIR, or MFS_NINDIR
INOPB -> FFS_INOPB, LFS_INOPB
INOPF -> FFS_INOPF, LFS_INOPF
blksize -> ffs_blksize, ext2_blksize, or lfs_blksize
sblksize -> ffs_blksize

These are not the only ambiguously defined filesystem macros, of
course, there's a pile more. I may not have found all the ambiguous
definitions of blksize(), too, as there are a lot of other things
called 'blksize' in the system.
 1.60 22-Jan-2013  dholland Stuff UFS_ in front of a few of ufs's symbols to reduce namespace
pollution. Specifically:
ROOTINO -> UFS_ROOTINO
WINO -> UFS_WINO
NXADDR -> UFS_NXADDR
NDADDR -> UFS_NDADDR
NIADDR -> UFS_NIADDR
MAXSYMLINKLEN -> UFS_MAXSYMLINKLEN
MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Sort out ext2fs's misuse of NDADDR and NIADDR; fortunately, these have
the same values in ext2fs and ffs.

No functional change intended.
 1.59 23-Apr-2012  drochner branches: 1.59.2;
everywhere else it is assumed that the filesystem block size fits into
a 32-bit "int" -- do the cast to quell a compiler warning in a more
sensible way
 1.58 20-Apr-2012  christos one more cast
 1.57 19-Apr-2012  christos Fix signed/unsigned issues.
 1.56 06-Mar-2011  bouyer branches: 1.56.4; 1.56.8;
merge the bouyer-quota2 branch. This adds a new on-disk format
to store disk quota usage and limits, integrated with ffs
metadata. Usage is checked by fsck_ffs (no more quotacheck)
and is covered by the WAPBL journal. Enabled with kernel
option QUOTA2 (added where QUOTA was enabled in kernel config files),
turned on with tunefs(8) on a per-filesystem
basis. mount_mfs(8) can also turn quotas on.

See http://mail-index.netbsd.org/tech-kern/2011/02/19/msg010025.html
for details.
 1.55 31-Jan-2010  mlelstv branches: 1.55.4; 1.55.6; 1.55.8;
Fix block shift to work with different device block sizes.

Unlike other filesystems this has some side issues because
the shift values are stored in the superblock and because
userland utitlies share the same fsbtodb macros.

-> the kernel now ignores the value stored in the superblock.
-> the macro adaption is only done for defined(_KERNEL) code.
 1.54 28-Jun-2009  ad +/*
+ * NOTE: COORDINATE ON-DISK FORMAT CHANGES WITH THE FREEBSD PROJECT.
+ */
 1.53 12-May-2009  ad Reserve a bit for FS_GJOURNAL (from FreeBSD).
 1.52 23-Feb-2009  dholland typo in comment
 1.51 31-Jul-2008  simonb branches: 1.51.2; 1.51.8;
Be consistent with #define<tab>.
 1.50 31-Jul-2008  simonb Merge the simonb-wapbl branch. From the original branch commit:

Add Wasabi System's WAPBL (Write Ahead Physical Block Logging)
journaling code. Originally written by Darrin B. Jewell while
at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

OK'd by core@, releng@.
 1.49 25-Dec-2007  perry branches: 1.49.6; 1.49.10; 1.49.12; 1.49.14; 1.49.16;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.48 23-Nov-2007  dholland branches: 1.48.2; 1.48.6;
Change the fs_clean member of the ffs superblock to be unsigned
(uint8_t instead of int8_t) - this prevents an ugly sign-extension
printing bug as well as formally undefined behavior when you mount an
unclean fs enough times.

From (my own) PR kern/28134; I've been carrying this patch for three
years, long enough to forget about it, and it's had no ill effects in
that time.

reviewed: pooka
 1.47 24-Sep-2007  pooka branches: 1.47.4;
Fix comment inaccurate from prehistoric times: default MINFREE is 5, not 10
 1.46 11-Dec-2005  christos branches: 1.46.30; 1.46.44; 1.46.46; 1.46.48;
merge ktrace-lwp.
 1.45 26-Feb-2005  perry branches: 1.45.4;
nuke trailing whitespace
 1.44 25-May-2004  hannken branches: 1.44.4; 1.44.6;
Add ffs internal snapshots. Written by Marshall Kirk McKusick for FreeBSD.

- Not enabled by default. Needs kernel option FFS_SNAPSHOT.
- Change parameters of ffs_blkfree.
- Let the copy-on-write functions return an error so spec_strategy
may fail if the copy-on-write fails.
- Change genfs_*lock*() to use vp->v_vnlock instead of &vp->v_lock.
- Add flag B_METAONLY to VOP_BALLOC to return indirect block buffer.
- Add a function ffs_checkfreefile needed for snapshot creation.
- Add special handling of snapshot files:
Snapshots may not be opened for writing and the attributes are read-only.
Use the mtime as the time this snapshot was taken.
Deny mtime updates for snapshot files.
- Add function transferlockers to transfer any waiting processes from
one lock to another.
- Add vfsop VFS_SNAPSHOT to take a snapshot and make it accessible through
a vnode.
- Add snapshot support to ls, fsck_ffs and dump.

Welcome to 2.0F.

Approved by: Jason R. Thorpe <thorpej@netbsd.org>
 1.43 21-Mar-2004  dsl Rework superblock validation logic to make adding validity tests easier.
Ensure that we don't use the first alternate superblock of a ffsv1
filesystem with 64k blocks (it is in the same place as an ffsv2 sb).
Fixes part of PR kern/24809
 1.42 20-Mar-2004  dsl Change comments - one I wrote earlier wasn't right.
Add a couple of notes about areas of the superblock being reassigned
when ffsv2 was imported.
 1.41 20-Mar-2004  dsl Add a large comment about the balls-up caused by the ffsv2 superblock
not being at 8k - causes all sorts of problems, in particular with
ffsv1 filessytems with 64k blocks, and disks that are reformatted from
ffsv1 to ffsv2 (and v.v.). see also PR kern/24809
 1.40 03-Jan-2004  dbj reintroduce compatbility defines for
fs_headswitch, fs_trkseek, fs_csmask, fs_csshift
fs_postbl, fs_rotbl, cg_blktot, cg_blks, cbtocylno, cbtorpos
 1.39 02-Jan-2004  dbj explicitly pad struct appleufslabel and use __attribute__((__packed__))
since apple put the 64 bit uuid field on a 4 byte boundary
 1.38 02-Jan-2004  dbj add uuid field to apple ufs volume label
 1.37 31-Dec-2003  dbj remove unused cs_numclusters field from struct csum_total
this avoids a potential future bug if it is ever used.
before this fix, fsck_ffs would check and fix this field to be zero
 1.36 31-Dec-2003  dbj update explanatory comment about NOCSPTRS to reflect that fs_active
is now within that region.
no functional change
 1.35 29-Sep-2003  dbj Declare fs_old_flags and fs_flags as unsigned.
This fixes a bug introduced in revision 1.120 of ffs_vfsops dated 2003/09/13
which results in fs_flags having a value of 0x7fffff00 when a superblock
is updated to use the new layout.
Discussed in http://mail-index.netbsd.org/tech-kern/2003/09/28/0003.html
 1.34 21-Aug-2003  dsl Split CGSIZE definition so it can be used with 64bit fpg values.
Split cg_start so magic can be done in libsa when it is known that the
filesystem isn't UFS2.
 1.33 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.32 05-Apr-2003  fvdl branches: 1.32.2;
* Use the old and new time fields in the superblock as well as a few others
to determine if this filesystem was mounted by an older kernel after
having been mounted by a newer one, to avoid some summary mismatches.
* Reinstate support for 4.2 cylinder groups (read-only, as it was before).
 1.31 05-Apr-2003  he Remember to prefix the manually-swapped FS magic numbers with 0x.
 1.30 03-Apr-2003  fvdl Avoid truncation of values in some macros that shift 64 bit values.
From FreeBSD.
 1.29 02-Apr-2003  fvdl Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.28 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.27 04-Nov-2002  wiz s/sqiud/squid/ in comment, reported by skrueger at europe com.
 1.26 28-Sep-2002  dbj Add support for the Apple UFS variation on ffs
This is the bulk of PR #17345

The general approach is to use a run time deteriminable value
for DIRBLKSIZ. Additional allowances are included for using
MAXSYMLINKLEN with FS_42INODEFMT and a shift in the cylinder group
cluster summary count array. Support is added for managing
the Apple UFS volume label.
 1.25 10-Apr-2002  mycroft Add a special case for nrpos=1 to cbtorpos(). This massively reduces CPU usage
by newfs(8) -- and fsck_ffs(8) on a relatively empty file system. There is
still one divide left in the inner loops, to calculate cylno values.
 1.24 10-Apr-2002  mycroft Use fsbtodb() rather than multiplying by NSPF().
 1.23 07-Jan-2002  lukem revert part of rev 1.14 - #include <ufs/ufs/dinode.h> - because that
makes it MUCH more difficult to reference this file stand-alone.
 1.22 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.21 19-Sep-2001  lukem - ffs_blkpref() changes:
- don't both updating fs->fs_cgrotor, since it's actually not used in
the kernel. from Manuel Bouyer in [kern/3389]
- when examining cylinder groups from startcg to startcg-1 (wrapping
at fs->fs_ncg), there's no need to check startcg at the end as well
as the start...
- highlight in the struct fs declaration that fs_cgrotor is UNUSED
 1.20 06-Sep-2001  lukem branches: 1.20.2;
Incorporate the enhanced ffs_dirpref() by Grigoriy Orlov, as found in
FreeBSD (three commits; the initial work, man page updates, and a fix
to ffs_reload()), with the following differences:
- Be consistent between newfs(8) and tunefs(8) as to the options which
set and control the tuning parameters for this work (avgfilesize & avgfpdir)
- Use u_int16_t instead of u_int8_t to keep track of the number of
contiguous directories (suggested by Chuck Silvers)
- Work within our FFS_EI framework
- Ensure that fs->fs_maxclusters and fs->fs_contigdirs don't point to
the same area of memory

The new algorithm has a marked performance increase, especially when
performing tasks such as untarring pkgsrc.tar.gz, etc.

The original FreeBSD commit messages are attached:

=====
mckusick 2001/04/10 01:39:00 PDT
Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>.
His description of the problem and solution follow. My own tests show
speedups on typical filesystem intensive workloads of 5% to 12% which
is very impressive considering the small amount of code change involved.

------

One day I noticed that some file operations run much faster on
small file systems then on big ones. I've looked at the ffs
algorithms, thought about them, and redesigned the dirpref algorithm.

First I want to describe the results of my tests. These results are old
and I have improved the algorithm after these tests were done. Nevertheless
they show how big the perfomance speedup may be. I have done two file/directory
intensive tests on a two OpenBSD systems with old and new dirpref algorithm.
The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports".
The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release.
It contains 6596 directories and 13868 files. The test systems are:

1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for
test is at wd1. Size of test file system is 8 Gb, number of cg=991,
size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current
from Dec 2000 with BUFCACHEPERCENT=35

2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system
at wd0, file system for test is at wd1. Size of test file system is 40 Gb,
number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k
OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50

You can get more info about the test systems and methods at:
http://www.ptci.ru/gluk/dirpref/old/dirpref.html

Test Results

tar -xzf ports.tar.gz rm -rf ports
mode old dirpref new dirpref speedup old dirprefnew dirpref speedup
First system
normal 667 472 1.41 477 331 1.44
async 285 144 1.98 130 14 9.29
sync 768 616 1.25 477 334 1.43
softdep 413 252 1.64 241 38 6.34
Second system
normal 329 81 4.06 263.5 93.5 2.81
async 302 25.7 11.75 112 2.26 49.56
sync 281 57.0 4.93 263 90.5 2.9
softdep 341 40.6 8.4 284 4.76 59.66

"old dirpref" and "new dirpref" columns give a test time in seconds.
speedup - speed increasement in times, ie. old dirpref / new dirpref.

------

Algorithm description

The old dirpref algorithm is described in comments:

/*
* Find a cylinder to place a directory.
*
* The policy implemented by this algorithm is to select from
* among those cylinder groups with above the average number of
* free inodes, the one with the smallest number of directories.
*/

A new directory is allocated in a different cylinder groups than its
parent directory resulting in a directory tree that is spreaded across
all the cylinder groups. This spreading out results in a non-optimal
access to the directories and files. When we have a small filesystem
it is not a problem but when the filesystem is big then perfomance
degradation becomes very apparent.

What I mean by a big file system ?

1. A big filesystem is a filesystem which occupy 20-30 or more percent
of total drive space, i.e. first and last cylinder are physically
located relatively far from each other.
2. It has a relatively large number of cylinder groups, for example
more cylinder groups than 50% of the buffers in the buffer cache.

The first results in long access times, while the second results in
many buffers being used by metadata operations. Such operations use
cylinder group blocks and on-disk inode blocks. The cylinder group
block (fs->fs_cblkno) contains struct cg, inode and block bit maps.
It is 2k in size for the default filesystem parameters. If new and
parent directories are located in different cylinder groups then the
system performs more input/output operations and uses more buffers.
On filesystems with many cylinder groups, lots of cache buffers are
used for metadata operations.

My solution for this problem is very simple. I allocate many directories
in one cylinder group. I also do some things, so that the new allocation
method does not cause excessive fragmentation and all directory inodes
will not be located at a location far from its file's inodes and data.
The algorithm is:
/*
* Find a cylinder group to place a directory.
*
* The policy implemented by this algorithm is to allocate a
* directory inode in the same cylinder group as its parent
* directory, but also to reserve space for its files inodes
* and data. Restrict the number of directories which may be
* allocated one after another in the same cylinder group
* without intervening allocation of files.
*
* If we allocate a first level directory then force allocation
* in another cylinder group.
*/

My early versions of dirpref give me a good results for a wide range of
file operations and different filesystem capacities except one case:
those applications that create their entire directory structure first
and only later fill this structure with files.

My solution for such and similar cases is to limit a number of
directories which may be created one after another in the same cylinder
group without intervening file creations. For this purpose, I allocate
an array of counters at mount time. This array is linked to the superblock
fs->fs_contigdirs[cg]. Each time a directory is created the counter
increases and each time a file is created the counter decreases. A 60Gb
filesystem with 8mb/cg requires 10kb of memory for the counters array.

The maxcontigdirs is a maximum number of directories which may be created
without an intervening file creation. I found in my tests that the best
performance occurs when I restrict the number of directories in one cylinder
group such that all its files may be located in the same cylinder group.
There may be some deterioration in performance if all the file inodes
are in the same cylinder group as its containing directory, but their
data partially resides in a different cylinder group. The maxcontigdirs
value is calculated to try to prevent this condition. Since there is
no way to know how many files and directories will be allocated later
I added two optimization parameters in superblock/tunefs. They are:

int32_t fs_avgfilesize; /* expected average file size */
int32_t fs_avgfpdir; /* expected # of files per directory */

These parameters have reasonable defaults but may be tweeked for special
uses of a filesystem. They are only necessary in rare cases like better
tuning a filesystem being used to store a squid cache.

I have been using this algorithm for about 3 months. I have done
a lot of testing on filesystems with different capacities, average
filesize, average number of files per directory, and so on. I think
this algorithm has no negative impact on filesystem perfomance. It
works better than the default one in all cases. The new dirpref
will greatly improve untarring/removing/coping of big directories,
decrease load on cvs servers and much more. The new dirpref doesn't
speedup a compilation process, but also doesn't slow it down.

Obtained from: Grigoriy Orlov <gluk@ptci.ru>
=====

=====
iedowse 2001/04/23 17:37:17 PDT
Pre-dirpref versions of fsck may zero out the new superblock fields
fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause
panics if these fields were zeroed while a filesystem was mounted
read-only, and then remounted read-write.

Add code to ffs_reload() which copies the fs_contigdirs pointer
from the previous superblock, and reinitialises fs_avgf* if necessary.

Reviewed by: mckusick
=====

=====
nik 2001/04/10 03:36:44 PDT
Add information about the new options to newfs and tunefs which set the
expected average file size and number of files per directory. Could do
with some fleshing out.
=====
 1.19 03-Sep-2001  lukem deprecate fs_fscktime; we never used it.

in an effort to maintain compatibility with freebsd/openbsd/whatever,
i'm attempting to get the superblock format in sync, and freebsd uses
the int32_t at this position for `fs_pendinginodes'.

if we ever decide to implement fscktime functionality, we'll:
a) make sure to liaise with the other projects to reserve the same
spare field
b) actually implement the code this time ...

(this is also preparing us for other changes, like the new dirpref code)
 1.18 02-Sep-2001  lukem Incorporate fix by iedowse @ FreeBSD to allow disks with large numbers of
cylinder groups to work correctly, with minor modifications by me to work
with our FFS_EI code. From the FreeBSD commit message:

The ffs superblock includes a 128-byte region for use by temporary
in-core pointers to summary information. An array in this region
(fs_csp) could overflow on filesystems with a very large number of
cylinder groups (~16000 on i386 with 8k blocks). When this happens,
other fields in the superblock get corrupted, and fsck refuses to
check the filesystem.

Solve this problem by replacing the fs_csp array in 'struct fs'
with a single pointer, and add padding to keep the length of the
128-byte region fixed. Update the kernel and userland utilities
to use just this single pointer.

With this change, the kernel no longer makes use of the superblock
fields 'fs_csshift' and 'fs_csmask'. Add a comment to newfs/mkfs.c
to indicate that these fields must be calculated for compatibility
with older kernels.

Reviewed by: mckusick
 1.17 31-Aug-2001  lukem More fixes from FreeBSD (with changes):
- Cast blk argument to lblktosize() to (off_t), to prevent 32 bit overflow.
whilst almost every use in ffs used this for small blknos, there are
potential issues, and it's safer this way. (as discussed with chuq)
- Use 64bit (off_t) math to calculate if we have hit our freespace() limit.
Necessary for coherent results on filesystems bigger than 0.5Tb.
- Use lblktosize() in blksize() and dblksize(), to make it obvious what's
happening
- Remove sblksize() - nothing uses it
 1.16 30-Aug-2001  lukem some improvements from freebsd/openbsd
- replace the unused fs_headswitch and fs_trkseek with fs_id[2], bringing
our struct fs closer to that in freebsd & openbsd (& solaris FWIW)
- dumpfs: improve warning message when cpc == 0
 1.15 30-Aug-2001  lukem - minor whitespace and comments cleanup
- replace "filesystem" with "file system"
- fix spelo (from freebsd)
 1.14 27-Jul-2001  lukem - multiple include protection
- pull in <ufs/ufs/dinode.h> for ufs_daddr_t
- mark a few fields as being "UNUSED" (because they are)
 1.13 23-Feb-2001  eeh branches: 1.13.2; 1.13.6;
Use int32_t for on-disk time_t values.
 1.12 15-Nov-1999  fvdl branches: 1.12.4;
Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.11 28-Jul-1998  drochner branches: 1.11.14; 1.11.16; 1.11.20;
The fragtbl[], inside[] and around[] variables are needed by "fsck",
so we can't put them inside "#ifdef _KERNEL".
Put declarations inside .c files where needed to preserve namespace.
 1.10 28-Jul-1998  mycroft Omit some externs if not _KERNEL.
 1.9 18-Mar-1998  bouyer Add support for reading/writing FFS in non-native byte order, conditioned
to "options FFS_EI". The superblock and inodes (without blk addr) are
byteswapped at disk read/write time, other metadatas are byteswapped
when used (as they are acceeded directly in the buffer cache).
This required the addition of a "um_flags" field to struct ufsmount.
ffs_bswap.c contains superblock and inode byteswap routines also used
by userland utilities.
 1.8 01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.7 11-Jun-1997  bouyer Add support for ext2fs, this needed a few modifications to ufs/ufs/inode.h:
- added an "union inode_ext" to struct inode, for the per-fs extentions.
For now only ext2fs uses it.
- i_din is now an union:
union {
struct dinode ffs_din; /* 128 bytes of the on-disk dinode. */
struct ext2fs_dinode e2fs_din; /* 128 bytes of the on-disk dinode. */
} i_din
Added a lot of #define i_ffs_* and i_e2fs_* to access the fields.
- Added two macros: FFS_ITIMES and EXT2FS_ITIMES. ITIMES calls the rigth
macro, depending on the time of the inode. ITIMES is used where necessary,
FFS_ITIMES and EXT2FS_ITIMES in other places.
 1.6 12-Apr-1995  mycroft Make use of the `fs_clean' field. If it was set when the file system was
mounted or upgraded to r-w, then clear it and set it again later when the
file system is unmounted or downgraded.
 1.5 14-Dec-1994  mycroft Sync with CSRG.
 1.4 13-Dec-1994  mycroft Sync with CSRG.
 1.3 20-Oct-1994  cgd update for new syscall args description mechanism, and deal safely
with wider types.
 1.2 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.1 08-Jun-1994  mycroft branches: 1.1.1;
Update to 4.4-Lite fs code, with local changes.
 1.1.1.2 01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.1 01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.11.20.2 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.11.20.1 21-Dec-1999  wrstuden Initial commit of recent changes to make DEV_BSIZE go away.

Runs on i386, needs work on other arch's. Main kernel routines should be
fine, but a number of the stand programs need help.

cd, fd, ccd, wd, and sd have been updated. sd has been tested with non-512
byte block devices. vnd, raidframe, and lfs need work.

Non 2**n block support is automatic for LKM's and conditional for kernels
on "options NON_PO2_BLOCKS".
 1.11.16.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.11.14.2 12-Mar-2001  bouyer Sync with HEAD.
 1.11.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.4.8 25-Nov-2001  he Pull up revision 1.21 (requested by lukem):
Mark fs_cgrotor as unused.
 1.12.4.7 25-Nov-2001  he Pull up revision 1.20 (requested by lukem):
Pull in enhanced ffs_dirpref() algorithm, which provides a
substantial performance improvement through better locality
between parent/child directories and their files, and by easing
the pressure on the buffer cache for metadata operations.
 1.12.4.6 25-Nov-2001  he Pull up revision 1.19 (requested by lukem):
Deprecate unused fs_fscktime.
 1.12.4.5 25-Nov-2001  he Pull up revision 1.18 (requested by lukem):
Change fs_csp[] from being a fixed size to being an array sized
as required. This allows file systems with more than about 15500
cylinder groups (on 32-bit systems) to be used.
 1.12.4.4 25-Nov-2001  he Pull up revision 1.17 (requested by lukem):
Prevent 32-bit overflows by converting to 64-bit quantities in
appropriate places.
 1.12.4.3 25-Nov-2001  he Pull up revision 1.16 (requested by lukem):
Replace unused fs_headswitch/trkseek with fs_id.
 1.12.4.2 25-Nov-2001  he Pull up revisions 1.14-1.15 (requested by lukem):
Mark a few fields as unused. Multiple include protection.
Also typo corrections.
 1.12.4.1 25-Nov-2001  he Pull up revision 1.13 (requested by lukem):
Use int32_t for on-disk time_t representation.
Convert %q_ to %ll_ in print formats.
 1.13.6.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.13.6.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.13.6.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.13.6.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.13.6.1 03-Aug-2001  lukem update to -current
 1.13.2.8 11-Nov-2002  nathanw Catch up to -current
 1.13.2.7 18-Oct-2002  nathanw Catch up to -current.
 1.13.2.6 17-Apr-2002  nathanw Catch up to -current.
 1.13.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.13.2.4 11-Jan-2002  nathanw More catchup.
 1.13.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.13.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.13.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.20.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.32.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.32.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.32.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.32.2.1 03-Aug-2004  skrll Sync with HEAD
 1.44.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.44.4.1 29-Apr-2005  kent sync with -current
 1.45.4.3 21-Jan-2008  yamt sync with head
 1.45.4.2 07-Dec-2007  yamt sync with head
 1.45.4.1 27-Oct-2007  yamt sync with head.
 1.46.48.1 06-Oct-2007  yamt sync with head.
 1.46.46.2 09-Jan-2008  matt sync with HEAD
 1.46.46.1 06-Nov-2007  matt sync with HEAD
 1.46.44.2 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.46.44.1 02-Oct-2007  joerg Sync with HEAD.
 1.46.30.1 09-Oct-2007  ad Sync with head.
 1.47.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.47.4.1 08-Dec-2007  mjf Sync with HEAD.
 1.48.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.48.2.1 26-Dec-2007  ad Sync with head.
 1.49.16.1 19-Oct-2008  haad Sync with HEAD.
 1.49.14.2 03-Jul-2008  simonb Store the location of the journal in the superblock. Currently
nothing really uses this, other than replay checking that what is
in the superblock matches what it expects.
 1.49.14.1 10-Jun-2008  simonb Initial commit of Wasabi System's WAPBL (Write Ahead Physical Block
Logging) journaling code. Originally written by Darrin B. Jewell
while at Wasabi and updated to -current by Antti Kantee, Andy Doran,
Greg Oster and Simon Burge.

Still a number of issues - look in doc/BRANCHES for "simonb-wapbl"
for more info.
 1.49.12.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.49.10.4 11-Mar-2010  yamt sync with head
 1.49.10.3 18-Jul-2009  yamt sync with head.
 1.49.10.2 16-May-2009  yamt sync with head
 1.49.10.1 04-May-2009  yamt sync with head.
 1.49.6.1 28-Sep-2008  mjf Sync with HEAD.
 1.51.8.2 23-Jul-2009  jym Sync with HEAD.
 1.51.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.51.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.55.8.1 20-Jan-2011  bouyer Snapshot of work in progress on a modernised disk quota system:
- new quotactl syscall (versionned for backward compat), which takes
as parameter a path to a mount point, and a prop_dictionary
(in plistref format) describing commands and arguments.
For each command, status and data are returned as a prop_dictionary.
quota commands features will be added to take advantage of this,
exporting quota data or getting quota commands as plists.

- new on disk-format storage (all 64bit wide), integrated to metadata for
ffs (and playing nicely with wapbl).
Quotas are enabled on a ffs filesystem via superblock flags.
tunefs(8) can enable or disable quotas.
On a quota-enabled filesystem, fsck_ffs(8) will track per-uid/gid
block and inode usages, and will check and update quotas in Pass 6.
quota usage and limits are stored in unliked files (one for users,
one for groups)l fsck_ffs(8) will create the files if needed, or
free them if needed. This means that after enabling or disabling
quotas on a filesystem; a fsck_ffs(8) run is required.
quotacheck(8) is not needed any more, on a unclean shutdown
fsck or journal replay will take care of fixing quotas.
newfs(8) can create a ready-to-mount quota-enabled filesystem
(superblock flags are set and quota inodes are created).
Other new features or semantic changes:
- default quota datas, applied to users or groups which don't already
have a quota entry
- per-user/group grace time (instead of a filesystem global one)
- 0 really means "nothing allowed at all", not "no limit".
If you want "no limit", set the limit to UQUAD_MAX (tools will
understand "unlimited" and "-")

A quota file is structured as follow:
it starts with a header, containing a few per-filesystem values,
and the default quota limits.
Quota entries are linked together as a simple list, each entry has a
pointer (as an offset withing the file) to the next.
The header has a pointer to a list of free quota entries, and
a hash table of in-use entries. The size of the hash table depends
on the filesystem block size (header+hash table should fit in the
first block). The file is not sparse and is a multiple of
filesystem block size (when the free quota entry list is empty a new
filesystem block is allocated). quota entries to not cross
filesystem block boundaries.

In memory, the kernel keeps a cache of recently used quota entries
as a reference to the block number, and offset withing the block.
The quota entry itself is keept in the buf cache.

fsck_ffs(8), tunefs(8) and newfs(8) supports are completed (with
related atf tests :)
The kernel can update disk usage and report it via quotactl(2).

Todo: enforce quotas limits (limits are not checked by kernel yet)
update repquota, edquota and rpc.rquotad to the new world
implement compat_50_quotactl ioctl.
update quotactl(2) man page

fsck_ffs required fixes so that allocating new blocks or inodes will
properly update the superblock and cg sumaries. This was not an issue up
to now because superblock and cg sumaries check happened last, but now
allocations or frees can happen in pass 6.
 1.55.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.55.4.1 21-Apr-2011  rmind sync with head
 1.56.8.1 29-Apr-2012  mrg sync to latest -current.
 1.56.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.56.4.2 23-Jan-2013  yamt sync with head
 1.56.4.1 23-May-2012  yamt sync with head.
 1.59.2.4 03-Dec-2017  jdolecek update from HEAD
 1.59.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.59.2.2 23-Jun-2013  tls resync from head
 1.59.2.1 25-Feb-2013  tls resync with head
 1.64.2.1 18-May-2014  rmind sync with head
 1.65.6.1 06-Apr-2015  skrll Sync with HEAD
 1.66.28.1 20-Apr-2020  bouyer Sync with HEAD
 1.66.18.1 21-Apr-2020  martin Sync with HEAD
 1.70.2.1 13-May-2023  martin Pull up following revision(s) (requested by chs in ticket #160):

usr.sbin/makefs/ffs/ffs_alloc.c: revision 1.31
sbin/tunefs/tunefs.c: revision 1.58
sbin/fsck_ffs/setup.c: revision 1.105
sbin/fsck_ffs/pass5.c: revision 1.56
usr.sbin/makefs/ffs.c: revision 1.74
usr.sbin/makefs/ffs/mkfs.c: revision 1.42
usr.sbin/makefs/Makefile: revision 1.40
sys/ufs/ffs/fs.h: revision 1.71
sbin/fsdb/fsdb.c: revision 1.54
sbin/resize_ffs/resize_ffs.c: revision 1.58
sbin/fsck_ffs/pass4.c: revision 1.29
usr.sbin/makefs/ffs/ffs_extern.h: revision 1.9
sbin/newfs/mkfs.c: revision 1.133
sys/ufs/ffs/ffs_alloc.c: revision 1.172
sbin/fsck_ffs/pass1b.c: revision 1.24
usr.sbin/dumpfs/dumpfs.c: revision 1.68
sys/ufs/ffs/ffs_extern.h: revision 1.88
usr.sbin/quotacheck/quotacheck.c: revision 1.51
sys/ufs/ffs/ffs_subr.c: revision 1.54
sbin/fsck_ffs/main.c: revision 1.91
sbin/fsck_ffs/pass1.c: revision 1.63

ufs: fixed signed/unsigned bugs affecting large file systems

Apply these commits from FreeBSD:
commit e870d1e6f97cc73308c11c40684b775bcfa906a2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Wed Feb 10 20:10:35 2010 +0000
This fix corrects a problem in the file system that treats large
inode numbers as negative rather than unsigned. For a default
(16K block) file system, this bug began to show up at a file system
size above about 16Tb.
To fully handle this problem, newfs must be updated to ensure that
it will never create a filesystem with more than 2^32 inodes. That
patch will be forthcoming soon.
Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks

commit 81479e688b0f643ffacd3f335b4b4bba460b769d
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date: Thu Feb 11 18:14:53 2010 +0000
One last pass to get all the unsigned comparisons correct.

In additional to the changes from FreeBSD, this commit includes quite a few
related changes to appease -Wsign-compare.
 1.72.2.1 02-Aug-2025  perseant Sync with HEAD
 1.12 22-Feb-2009  ad PR kern/26878 FFSv2 + softdep = livelock (no free ram)
PR kern/16942 panic with softdep and quotas
PR kern/19565 panic: softdep_write_inodeblock: indirect pointer #1 mismatch
PR kern/26274 softdep panic: allocdirect_merge: ...
PR kern/26374 Long delay before non-root users can write to softdep partitions
PR kern/28621 1.6.x "vp != NULL" panic in ffs_softdep.c:4653 while unmounting a softdep (+quota) filesystem
PR kern/29513 FFS+Softdep panic with unfsck-able file-corruption
PR kern/31544 The ffs softdep code appears to fail to write dirty bits to disk
PR kern/31981 stopping scsi disk can cause panic (softdep)
PR kern/32116 kernel panic in softdep (assertion failure)
PR kern/32532 softdep_trackbufs deadlock
PR kern/37191 softdep: locking against myself
PR kern/40474 Kernel panic after remounting raid root with softdep

Retire softdep, pass 2. As discussed and later formally announced on the
mailing lists.
 1.11 04-Mar-2007  christos branches: 1.11.40; 1.11.50; 1.11.56;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.10 11-Dec-2005  christos branches: 1.10.26;
merge ktrace-lwp.
 1.9 26-Feb-2005  perry branches: 1.9.4;
nuke trailing whitespace
 1.8 15-Oct-2003  hannken branches: 1.8.8; 1.8.10;
Add the gating of system calls that cause modifications to the underlying
file system.
The function vfs_write_suspend stops all new write operations to a file
system, allows any file system modifying system calls already in progress
to complete, then sync's the file system to disk and returns. The
function vfs_write_resume allows the suspended write operations to
complete.

From FreeBSD with slight modifications.

Approved by: Frank van der Linden <fvdl@netbsd.org>
 1.7 02-Apr-2003  fvdl branches: 1.7.2;
Add support for UFS2. UFS2 is an enhanced FFS, adding support for
64 bit block pointers, extended attribute storage, and a few
other things.

This commit does not yet include the code to manipulate the extended
storage (for e.g. ACLs), this will be done later.

Originally written by Kirk McKusick and Network Associates Laboratories for
FreeBSD.
 1.6 24-Jan-2003  fvdl Bump daddr_t to 64 bits. Replace it with int32_t in all places where
it was used on-disk, so that on-disk formats remain the same.
Remove ufs_daddr_t and ufs_lbn_t for the time being.
 1.5 01-Dec-2002  matt Add multiple inclusion protection for headers. Fix mismatched
variable declarations (missing const's) as needed.
 1.4 18-Dec-2001  fvdl Bring over fixes from FreeBSD that weren't incorporated yet, mainly
from Kirk McKusick. They implement taking pending block/inode frees
into account for the sake of correct statfs() numbers, and adding
a new softdep type (newdirblk) to correctly handle newly allocated
directory blocks.

Minor additional changes: 1) swap the newly introduced fs_pendinginodes
and fs_pendingblock fields in ffs_sb_swap, and 2) declare lkt_held
in the debug version of the softdep lock structure volatile, as it
can be modified from interrupt context #ifdef DEBUG.
 1.3 22-Jun-2000  fvdl branches: 1.3.2; 1.3.4; 1.3.8;
Copyright changed.
 1.2 15-Nov-1999  fvdl branches: 1.2.2; 1.2.6;
Add Kirk McKusick's soft updates code to the trunk. Not enabled by
default, as the copyright on the main file (ffs_softdep.c) is such
that is has been put into gnusrc. options SOFTDEP will pull this
in. This code also contains the trickle syncer.

Bump version number to 1.4O
 1.1 19-Oct-1999  fvdl branches: 1.1.2;
file softdep.h was initially added on branch fvdl-softdep.
 1.1.2.2 26-Oct-1999  fvdl Merge changes in the trickle-sync and softdep code as done by Kirk McKusick
in FreeBSD since the version that we based the branch on. Merging mostly
done by Ethan Solomita <ethan@geocast.com>.

Also, make sure the syncer thread/process isn't active when we're
unmounting a filesystem. This could wreak havoc. XXX should be done
on a per-mountpoint basis, but especially the softdep code would
end up to be a big pile of vfs_busy() calls.
 1.1.2.1 19-Oct-1999  fvdl Bring in Kirk McKusick's FFS softdep code on a branch.
 1.2.6.1 23-Jun-2000  fvdl Update for changed copyright notice.
 1.2.2.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.3.8.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.3.4.2 11-Dec-2002  thorpej Sync with HEAD.
 1.3.4.1 08-Jan-2002  nathanw Catch up to -current.
 1.3.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.3.2.1 22-Jun-2000  bouyer file softdep.h was added on branch thorpej_scsipi on 2000-11-20 18:11:47 +0000
 1.7.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.7.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.7.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.7.2.1 03-Aug-2004  skrll Sync with HEAD
 1.8.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.8.8.1 29-Apr-2005  kent sync with -current
 1.9.4.1 03-Sep-2007  yamt sync with head.
 1.10.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.11.56.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.11.50.1 03-Mar-2009  skrll Sync with HEAD.
 1.11.40.1 04-May-2009  yamt sync with head.

RSS XML Feed